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Preface 



This volume constitutes the proceedings of the Second Symposium on Programs 
as Data Objects (PADO-II), held at the University of Aarhus, Denmark, on May 
21-23, 2001. PADO-II was colocated with the Third International Workshop on 
Implicit Computational Complexity (ICC 2001) and the Seventeenth Conference 
on the Mathematical Foundations of Programming Semantics (MFPS XVII) . 

The first PADO was organized by Harald Ganzinger and Neil Jones, in 1985. 
This second symposium took place at the occasion of Neil Jones’s 60th birthday, 
and on his wish, we organized it as a research event. The call for papers was 
open and elicited 30 submissions from 12 countries. Overall, 145 reviews were 
collected, and based on these, the program committee selected 14 papers for 
presentation. With one exception, each submission received at least 4 reviews. 
Where relevant, a transcript of the (electronic) PC meeting was also enclosed. 

PADO-II was sponsored by BRICS^ and the Esprit Working Group APPSEM, 
and organized in cooperation with the European Association for Programming 
Languages and Systems (EAPLS) and the Special Interest Group on Program- 
ming Languages of the Association for Computing Machinery (ACM SIGPLAN). 
We gratefully acknowledge their support. 

We also extend our thanks to the PC members and external reviewers for 
their time and thoughts, Janne Kroun Christensen and Karen Kjaer Mpller for 
their organizational help, the <bigwig> project for hosting our submission web 
site, and Daniel Damian for setting it up and maintaining it. 



February 2001 



Olivier Danvy and Andrzej Filinski 



^ Basic Research in Computer Science (www.brics.dk), 
funded by the Danish National Research Foundation. 
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Program Analysis 

for Implicit Computational Complexity 



Neil D. Jones 



DIKU, University of Copenhagen 
Universitetsparken 1, DK-2100 Copenhagen 0, Denmark 
neil@diku.dk 



This talk brings together ideas from two lines: automatic estimation of program 
running times, and implicit computational complexity. It describes ongoing re- 
search. Recent work in the two areas has been done by Bellantoni and Cook, 
Benzinger, Hofmann, Jones, Crary and Weirich, Leivant, Marion, Schwichten- 
berg, and others. 

A main goal of implicit computational complexity is to “capture” complexity 
classes such as ptimef (polynomial-time computable functions) by computing 
formalisms that do not impose explicit bounds on time or space resources. Several 
researchers have succeeded in reaching this goal, a well-known example being the 
Bellantoni-Cook “safe primitive recursion on notation.” 

It must be said, however, that recursion-theoretic formalisms such as primi- 
tive recursion are not very close to programming practice. In particular natural 
algorithms, as seen in introductory algorithm courses, often do not fall into 
existing implicit complexity classes. In some cases this has even been proven im- 
possible, e.g., Colson established that primitive recursion alone cannot express 
computing the minimum of two numbers by the obvious linear-time algorithm. 

In this work we identify a decidable class of algorithms such that all can be 
executed within polynomial time (or logarithmic space); and as well, the class 
includes many natural algorithms that are used in solving real problems. 

For a standard first-order functional language we devise a type system giv- 
ing information on the variations of its function parameters in terms of program 
inputs, and on run-time bounds for program-defined functions. Every syntacti- 
cally correct program is well-typed, i.e., the language has a so-called “soft” type 
system. 

The type information is extracted by data-flow analysis algorithms that ex- 
tend the “size-change” framework of our POPL 2001 paper to account for run- 
ning times as well as termination. The analysis allows automatic detection of 
programs that are guaranteed to run (or be runnable) in polynomial time. 

Theorems are proven that this is indeed the case; and that the class is a 
proper generalization of “safe recursion” and some related schemes provided by 
other researchers. Several representative natural and efficient algorithms are seen 
to fall into the class, providing evidence that the class is “large enough.” 
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Deriving Pre-conditions for Array Bound Check 

Elimination 



Wei-Ngan Chin, Siau-Cheng Khoo, and Dana N. Xu 



School of Computing 
National University of Singapore 
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Abstract. We present a high-level approach to array bound check op- 
timization that is neither hampered by recursive functions, nor disabled 
by the presence of partially redundant checks. Our approach combines a 
forward analysis to infer precise contextual constraint at designated pro- 
gram points, and a backward method for deriving a safety pre-condition 
for each bound check. Both analyses are formulated with the help of a 
practical constraint solver based on Presburger formulae; resulting in an 
accurate and fully automatable optimization. The derived pre-conditions 
are also used to guide bound check specialization, for the purpose of elim- 
inating partially redundant checks. 



1 Introduction 

Array bound check optimization has been extensively investigated over the last 
three decades ^1 |3 El Q El with renewed interests as recently as 
While successful bound check elimination can bring about measur- 
able gains in performance, the importance of bound check optimization goes 
beyond these direct gains. In safety-oriented languages, such as Ada or Java, all 
bound violation must be faithfully reported through precise exception handling 
mechanism. With this, the presence of bound checks could potentially interfere 
with other program analyses. For example, data-flow based analysis must take 
into account potential loss in control flow should array bound violation occurs. 

In this paper, we provide fresh insights into the problem of array bound 
check elimination, with the goal of coming up with a much more precise inter- 
procedural optimization. 

Let us first review the key problem of identifying bound checks for elimina- 
tion. In general, under a given context, a check can be classified as either: 

- unsafe] 

- totally redundant] 

- partially redundant. 

A check is classified as unsafe if either a bound violation is expected to 
occur, or its safety condition is unknown. As a result, we cannot eliminate such 
a check. A check is classified as totally redundant if it can be proven that no 
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bound violation will occuiQ. Lastly, a check is said to be partially redundant 
if we can identify a pre-condition that could ensure that the check becomes 
redundant. 

Note that the classification of a check depends upon a given context. Specif- 
ically, a partially redundant check under a given context can become totally 
redundant when the context becomes “stronger”. (Contexts are expressed as 
predicate.) 

To illustrate these three types of checks, consider the following two functions, 
expressed in a first-order functional language. 

newsub{arr,i,j) = if (0 < « < j) then Ll@/fl@sM&(orr, j) else —1 

last(arr) = let v = length(arr) in L2@H2@sub{arr,v) 

Arrays used in this paper are assumed to start at index 0, and can be ac- 
cessed by primitive functions, such as sub. Furthermore, we annotate each array 
access sub call by some check labels, e.g. LI, HI, to identify the low and high 
bound checks respectively. The first function, newsub, accesses the element of 
an array after performing a test on its index parameter i. For the access of 
sub{arr, i) to be safe, both a low bound check LI = i> 0 and a high bound 
check Hl = i < length{arr) must be satisfied. 

Under the context 0 < « < j of the if-branch, we can prove that LI is 
totally redundant, but the same cannot be said about HI. In fact, the HI check 
is partially redundant, and could be made redundant under appropriate pre- 
conditions, for e.g. j < length(arr). 

The second function is meant to access the last element of a given ar- 
ray but it contains a bug. While the index of our array ranges from 0 to 
length(arr) — 1, this function used an index outside of this range. Hence, its 
upper bound check H2 = v < length{arr) is unsafe as it contradicts with the 
assertion v = length(arr) from the let statement. 

Totally and partially redundant checks are traditionally identified by two 
separate techniques. As a matter of fact, forward data flow analysis P which 
determines available expressions has been primarily used to identify totally re- 
dundant checks. An expression (or check) e is said to be available at program 
point p if some expression in an equivalence class of e has been computed on 
every path from entry to p, and the constituent of e has not been redefined in 
the control flow graph (CFG). Using this information, the computation of an 
expression (or check) e at point p is redundant if e is available at that point. 

Partially redundant checks are more difficult to handle. Traditionally, a back- 
ward dataflow analysis |B| is used to determine the anticipatability of expressions. 
An expression (or check) e is anticipatable at program point p if e is computed 
on every path from p to the exit of the CFG before any of its constituents are 
redefined. By hoisting an anticipatable expression to its safe earliest program 
point, selected checks can be made totally redundant. Historically, hoisting of 
anticipatable check is deemed as crucial for eliminating checks from loop-based 

^ This includes the possibility that the check can be safely executed or it can be 
avoided. 
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programs. Unfortunately, hoisting of checks causes bound errors to be flagged at 
an earlier program point, creating problems for precise exception handling. 

In this paper, we propose a new approach to eliminating array bound checks. 
Our approach begins with a forward contextual- constraint analysis that syn- 
thesize contexts for checks in a program. This is then followed by a backward 
derivation of weakest pre-eonditions needed for checks to be eliminated. 

For the example given above, our method determines that the lower bound 
check LI in the function newsub is totally redundant; the upper bound check 
H2 in the function last is unsafe. Furthermore, the upper bound check HI of 
newsub is determined to be partially redundant; the derived pre-condition being: 

pre{Hl) = {i < —1) V (j < « A 0 < «) V (i < length(arr)) 

To overcome the problem arising from hoisting of partially-redundant checks, 
we propose to use program specialization to selectively enforce contexts that are 
strong enough for eliminating partially redundant checks. We note that such spe- 
cialization technique is also advocated by m in their bound check optimization 
of Java programs. 

Our new approach is built on top of an earlier work on sized-type inference^, 
where we are able to automatically infer input/output size relation and also de- 
termine invariants for parameters of recursive functions over the sizes of data 
structures used. The inference is performed accurately and efficiently with the 
help of a constraint-solver on Presburger formP2|- The presence of sized type 
greatly enhances inter-procedural analysis of contextual constraints, which are 
crucial for identifying both unsafe and totally redundant checks. More impor- 
tantly, accurate contextual constraint also helps in the derivation of safety pre- 
conditions for partially-redundant checks. With the derived pre-condition, we 
can provide specialized code to selectively eliminate partially-redundant checks 
based on the available contexts. The specialization process can be further tuned 
to provide a range of time/space tradeoff. 

Our main contributions are: 

1. To the best of our knowledge, we are the first to handle partially redundant 
checks through the backward derivation of safety pre-condition after contex- 
tual constraint has been gathered in a separate forward phase. This gives 
very accurate result for eliminating partially-redundant checks. 

2. We deal directly with recursive functions, and the invariant synthesis is per- 
formed only once for each recursive function, instead of for every occurrence 
of checks within the function. Except for ppil EH] whose methods are re- 
stricted to totally redundant checks, almost all previous work for bounds 
check elimination deal with only loop-based programs. 

3. We design a simple yet elegant approach to derive the weakest pre-condition 
(with respect to a given contextual constraint) for check elimination from 
the context of the check and the synthesized invariant. Our approach works 
seamlessly across recursive procedures. 

4. We support inter-procedural optimization through backward propagation of 
a function’s pre-condition to its callers to become a check. 
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5. We introduce three forms of bound check specialization: polyvariant for max- 
imal specialization, monovariant for minimal code duplication, and duovari- 
ant specialization for a space/time tradeoff. While the idea of using context- 
based program specialization dEI is not new, our work is novel in its use 
of pre-condition for guiding effective specialization. 

Section Ogives an overview of our method by introducing sized types and the 
main steps towards bound check specialization. Section |3 formalizes the context 
synthesis as a forward analysis method. It also illustrates how invariants on 
recursive functions can be synthesized, so as to provide informative contexts for 
recursive functions. Section 0 describes the key steps for classifying checks, and 
the inter-procedural mechanism for deriving pre-conditions for each partially 
redundant check. Section shows how the derived pre-conditions can be used to 
guide bounds check specialization; while Section 0shows that the cost of analysis 
is within acceptable limit. Related work is compared in Section d before we 
discuss some future directions of research in the Section El 



Var 


(Variables) 


a 6 Arr (Array Names) 


Fname 


(Function Names) 


n 6 Int (Integer Constants) 


Label 


(Labels for checks) 




Prim 


(Primitives) 




p ::= 


+ |- I*l/I>I = M=I 


< 1 >= 1 




<= 1 not 1 or 1 and \ length \ 


newArr 


Call 


(Calls) 




K ::= 


I f(xi,...,x„) 1 sub{a,x) 


1 update{a^x\^X2) 


Exp 


(Expressions) 




e ::= 


X \ n \ p (xi, ■ ■ ■ ,Xn) 1 K 1 if eo 


then 6 i else 62 | let x = e\ in 62 


Def 


(Function Definition) 





d ::= / (xi, . .. ,Xn) = e 



Fig. 1. The Language Syntax 



2 Overview 

We apply our technique to first-order typed functional language with strict se- 
mantics. Recursive functions in the language are confined to self-recursion. Cur- 
rently, mutual recursion are encoded into self-recursion by appropriate tagging of 
input and output. The language is defined in Fig.d Note that the language syn- 
tax includes check labels (also called labels for brevity) that identify bound checks 
(fe., array bound checks or checks that originated from these bound checks). 
Check labels appears syntactically at calls to functions/operations that involve 
bound checks. However, We do not label self-recursive calls, as we provide slightly 
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different treatment to recursive function definitions (as explained in Section 
Lastly, check labels are automatically inserted into programs by our analysis. 

We restrict the arguments to a function to be just variables. This simplifies 
presentation, without loss of generality. 



Sized Type = (AnnType, F) 



Annotated Type Expressions: 

r G V (Size Variables) t G TVar (Type Variables) 

a G AnnType (Annotated Types) 
a ;;= M t . a \ r | t ^ t 
T G Basic (Basic Type) 

T ;:= t I (Ti,..,Tn) I Arr” r | Int” | Bool” 



Presburger Formnlae: 

n E Z (Integer constants) r G V (Variable) 

4> E F (Presburger Formulae) 

4> ::= fe I (jii A 02 I ((ii V <(>2 I “I <?i I 3 v . (f> \ V f . 0 
b E BExp (Boolean Expression) 

b ::= True \ False | fli = fl2 | Qi 7^ 03 | ai < 02 

I di > fl2 I < a2 I ai > (22 

a E AExp (Arithmetic Expression) 
a ::= n|t;|?2*a|ai + a2|— a 



Fig. 2. Syntax of Sized Types 



We only consider well-typed programs. We enhance the type system with the 
notion of sized types, which captures the size information about the underlying 
expressions/values. For a function, sized type reveals size relationships amongst 
the parameters and results of that function. The syntax of sized type is depicted 
in Fig. 0 It is a pair containing an annotated type and a Presburger formula. 
An annotated type expression augments an ordinary type expression with size 
variables; the relationship among these variables are expressed in the associated 
formula. In this paper, we consider only three basic types: Arrays, integers, and 
booleans. The annotated type for arrays is Arr'" r, where v captures the array 
size; for integers, it is Int'", where v captures the integer value; for booleans, it 
is Bool", where v can be either 0 or 1, representing the values False and True 
respectively. Occasionally, we omit writing size variables in the annotated type 
when these variables are unconstrained. 

A sample program for our language is shown in Fig. El This program contains 
four functions that implement binary search. The main function bsearch takes 
an array and a key in order to search for an element in the array. If found, the 
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getmid :: (Arr“ Int, Int^, Int*) ^ (lnt’",Int) 

Size a > 0 A 2m <l + hAl-\-h<l-\- 2m 
getmid{arr, lo, hi) = let m = (lo + hi)/2 

in let X = L3@H3@sub arr m in [m,x) 
cmp :: (lnt‘,Int-’) — > Int’’ 

Size (i < i A r = —1) V {i = j A r = 0) V {i > j A r = 1) 
cmp(k, x) = if k < X then — 1 else if k = x then 0 else 1 
loofc :: (Arr“ Int, Int*, Int*, Int) ^ Int"" 

Size (a > 0) A ((1 < /i) V (I > h h r = —1)) 

Inv a* = a A I < h, I* A h* < h A 

2 + 2l + 2h* < h + 3l* A l + 2h* < h + 21* 
look{arr , lo , hi , key) = 

if {lo <= hi) then 

let (m, x) = L4@H4@getmid{arr, lo, hi) 
in let t = cmp{key, x) 

in if t < 0 then look{arr, lo,m — 1, key) 

else if {t == 0) then m else look{arr, m + 1, hi, key) 

else — 1 

bsearch :: (Arr“ Int, Int) ^ Int 
Size (a > 0) 

bsearch{arr, key) = let v = length{arr) in L5@H5@look{arr,0, v — I, key) 



Fig. 3. Binary Search Program 



corresponding array index is returned, otherwise — 1 is returned. The recursive 
invocation of binary search is carried out by the function look. 

2.1 Use of Sized Types 

Sized type of a function captures the relationship between sizes of the func- 
tion’s input and output. For instance, the annotated type for function cmp is 
(lnt*,InU) — > Int’’, where i, j are the respective input values, and r is its 
output. The size constraint (identified by the keyword Size) states three possi- 
ble outputs for calling cmp, depending on whether the argument k is less than, 
equal to, or greater than the argument x. 

Even more importantly, through sized-type inference 0, we can synthesize, 
for a recursive function, an invariant that describes changes in size of input argu- 
ments of the function during its nested recursive-call invocations. For example, 
an accurate invariant relationship between the (first three) argument sizes of any 
nested recursive calls to look, a*, l*,h*, and the (first three) parameter sizes of 
the initial first call to look, namely a, l,h, has been captured as the following 
Presburger formula : 

inv{look) = a* = a A I < h, I* A h* < h A 

2 + 2l + 2h* < h + 3l* A l + 2h* < h + 2l* 
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This invariant tells us that, in the successive recursive invocations of look, the 
size of the first argument remains unchanged. Also, the values of the argument 
lo (I*) in the successive calls never get smaller than its initial value I, while those 
of hi (h*) never get bigger than the initial value h. For example, if the first call 
to look is look(2,10), we know that successive recursive calls look{lo,hi) will 
satisfy the relationship lo > 2 A hi < 10. Terminology- wise, we call the initial 
set of arguments/size variables (eg., a, I, and h) the initial arguments /sizes, 
and that of an arbitrarily nested recursive call (eg., a* I*, and h*) the reeursive 
arguments /sizes. 



+ :: (Int*, InC) ^ Int^ 

Size k = i + j 
= :: (lnt“, Int^) ^ Bool* 

Size (0 < 6 < 1) A ((i = j A & 
sub :: ((Arr“ t), Int‘) ^ t 

Size a > 0 

Req L:i>0; H : i < a 
length :: (Arr“ t) ^ Int’ 

Size a > 0 A a = i 



— :: (Int’j Int-’) — > Int* 

Size k = i — j 

= 1) V (i # j A 6 = 0)) 
update :: ((Arr“ t), Int’, t) ^ () 

Size a > 0 

Req i;j>0; H : i < a 
newArr :: (lnt‘,T) ^ Arr“ t 

Size {i > 0 A a = i) 



Fig. 4. Sized Types of Some Primitives 



Fig. ^depicts the sized types of a collection of primitive functions used in the 
rest of this paper. For array-access operations {sub and update), we also include 
their respective pre-conditions, which must be satisfied for the operations to be 
safe. These pre-conditions are identified by the keyword Req. 

Once the sized types of related functions have been inferred, we proceed to 
handle bound check optimization. The process works in a bottom-up fashion, 
starting with functions at the bottom of the call hierarchy. We list the steps 
involved below. Throughout the rest of the paper, we use the binary search 
program depicted in Fig. El as the running example. 

Step 1 Forward contextual-constraint analysis. 

Step 2 Backward pre-condition derivation. 

Step 3 Bound check specialization. 

These steps are described in details in the following sections. 

3 Context Synthesis 

We begin by determining the context within which a check occurs. Contextual 
information is described in Presburger form. It is called contextual constraint. 
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and is gathered by traversing the syntax tree of the function body, beginning 
from the root of the tree to a check-labelled call. Constraints gathered during 
traversal include constraint for selecting a branch (of if-expression), assertion 
about the sizes of local variables, and post-conditions of function calls. 



C :: 


Exp ^ 


Env ^ F ^ ( AnnType x P(( Label, 


F)) X F) 




where Env = Var 


AnnType x F 




C[x\ 


Fip = 


let (t, P) = 


r 1 a: ] in {t, <P, (p ) 




C [ n] 


Ftp = 


let V = new 


Var in (Int”, 0, [v = n)) 




C If 


n)j Fip = let ((ti, Tn) T, (pf) = 


«(^[/l) 








X = ULi 










ij'i, (pi) = r [xi] Vi e {1 


...,n} 








0 = 3 A . 0 ^ A (AL 


1 (0i A {eqr'i n))) 






in 
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Fig. 5. Definition of the Context-Derivation Function C 



Forward analysis C is employed to synthesize contextual constraints. This is 
depicted in Fig. 0 We first explain how this is done for non-recursive function, 
and describe the recursive case in the following section. 

C operates on expressions. It takes in a sized- type environment F which binds 
the program variables, primitives, and user-defined functions to their respective 
sized types. It produces a triple consisting of: the annotated type of the subject 
expression, a set of bindings between call labels appearing in the expression and 
their contextual constraints, and the size constraint of the subject expression. 
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During the traversal of the syntax tree, C updates F with sized types of locally 
defined variables. It also maintains a constraint ^ that captures the context of 
the subject expression. Initially, ip is set to the value True. When a branch of 
an if-expression is chosen, the constraint leading to this decision is captured in 
Ip. When a labelled call is encountered, its contextual constraint is derived by 
combining (via conjunction) ip with related constraints kept in the environment 
p. The result is expressed as Tr,,p. Formally, it is defined as follows: 

= A(Ui>o'?i) where 

^0 = {V'} 

= {(p I 3x,T.rlxj = {T,(p)-, fv{(p) n ^ iP ; (p ^ Uj <^ 

As the environment F is finite, computation of J-r,4> always terminates. 

Notation- wise, in Fig.0 function newVar returns a new variable. Function a 
performs renaming of size variables (like a-conversion) . It is overloaded so that 
it can take in either an annotated type or a sized type (which is a pair). It con- 
sistently renames all size variables occurring in its argument. Lastly, operation 
eq Ti T 2 produces a conjunction that equates the corresponding size variables 
of two annotated types. For instance, {eq Int" Int“) produces the constraint 
(v = w). Lastly, F[x :: (t, <())] denotes updating of the environment F by a 
new binding of a; to a sized type (t, (p) . 

As an example, for the following function definition, 

newsub :: (Arr“ Int, Int*,Int^) — > Int 

newsub{arr,i,j) = if 0 < f < j then Ll@i/l@sM6(arr, f) else —1 
C determines the context for the labelled call to be: 

ctx{Ll) = ctx{F[l) = a>0A0<i<j 



3.1 Recursive- Call-Invariant Synthesis 

Invariant synthesis is in general a hard problem for recursive function defini- 
tions. Two pieces of invariant information are useful. First, invariant describ- 
ing input/output size relation (ie., its sized type) of a recursive function can 
be propagated across functions to achieve better context synthesis. Computing 
such invariant is not always possible, however, as sometimes the precise relation 
between input- and output-size is beyond Presburger formulation. 

Second, recursive- call invariant captures the argument-size relationship be- 
tween an initial call and an arbitrarily nested recursive call of the same function. 
This relation is needed for synthesizing the contextual constraint of a labelled 
call invoked at arbitrarily nested depth. Fortunately, such relation can often be 
formulated precisely using Presburger formula. 

Computation of recursive-call invariant proceeds as follows: We first compute 
the constraint relating the parameter sizes of a function and the argument sizes 
of all recursive calls textually occurring in the function body. Conceptually, 
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this constraint spells out the change in argument size during one unfolding of 
the recursive call. It can be captured by a procedure similar to the context 
computation C. For example, consider the function look defined in the binary 
search program. We obtain the following constraint, which we call U: 

U := [a,l,h] [aflfh*] : h > I A 

3 (m . 2m < h + I A h + I < 1 + 2m A 
l(l*= I A h*=m - 1) V {l*=m + l A h* = h))) 

The notation used here is adapted from literature work in Omega Calcula- 
tor P21- It specifies the constraint between two sets of variables: [a,l,h] and 
[a* Z* h*] . The former is the set of initial parameter sizes (they are called the 
source), and the latter being the set of argument sizes of a recursive call (they 
are called the target). 

Next, we perform inductive computation to infer the change in argument 
size resulting from arbitrary number of recursive-call unfolding. The result is 
the recursive-call invariant. In the case of look, we have: 

inv(look) = a* = a A I < h, I* A h* < h A 

2 + 2l + 2h* < h + Sl* A l + 2h* < h + 2l* 

Several researchers, including the present authors, have proposed different 
techniques for synthesizing invariants. As these techniques are complementary 
in power and efficiency, we believe a collection of these techniques is needed to 
do a decent job. This includes: 

Polyhedra analysis. This is proposed and developed by Cousot and Halb- 
wachs |Sl E3) as well as King and his co-workers Enm It is an abstract 
interpretation approach to finding the input/output size relation through fixed- 
point computation over linear constraint. Both convex- hull operation (to elim- 
inate multiple disjuncts) and widening operation (to generalize a constraint by 
dropping some conjuncts that cannot be subsumed by others in an ascending 
chain of constraints) are used as generalization techniques to ensure termination 
of the analysis. For recursive-call invariant computation, we modify this analysis 
by ignoring the degenerated case of a recursive definition from our computation. 
As an example. Fig. 0 illustrates a trace of such computation for the function 
look with the aid of the Omega calculator. 

In the above, lines begin with ff are comments; lines end with ; are com- 
mands to the Omega Calculator H3!; outputs from the Calculator are indented 
rightward, (hull U) computes the convex hull of U (viewed as a relation) and 
widen{U 2 , U 3 ) generalizes U 2 to yield a constraint W 2 such that both U 2 and U 3 
are instances of W 2 ■ (We refer the reader to the work of Halbwachs Pl E]j for 
detail description of these two operations.) union signifies disjunction, compose 
combines two constraints by matching (and eliminating) the target of the former 
with the source of the latter. The second and third steps of the above trace above 
are iterative computation of fixed-point computation. The last command checks 
if a fixed-point is reached. 

Transitive-closure operation. This is a fixed-point operation provided in 
the Omega Calculator m- Given a linear constraint expressed in the form of 
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# First Approximation 
Ui := hull U ; 

Ui = [a,l,h] [a*,l*,h*] : a = a* A I < h, I* A h* < h A 

l + 2h* < h + 2l* A 2 + 21 + 2h* < h + 3l* A 
h + 2l* < 3 + l + 2h* A h + 4l* < 4 + 2l + 3h* 

# Second Approximation 

U2 ■= hull ( Ui union ( JJi compose U)) ; 

U2 = [a,l,h\ [a*,l*,h*\ : a = a* A I < h, I* A h* < h A 

h + 4l* < 9 + l + 4h* A l + 2h* < h + 21* A 

2 + 2l + 2h* < h + 3l* 

# Third Approximation 

U3 := hull (U2 union [U2 compose U)) ; 

U3 = [a,l,h] — > [a*,l*,h*] ■. a = a* A I < h, I* A h* < h A 

h + 8l* < 2l + l + 8h* A 2 + 2l + 2h* < h + 31* A 

1 + 2h* < h + 2l* 

# Apply Generalization by Widening 
W2 ’■= widen{U2, U3) ; 

W2 = [a,l,h\ — > [a* 1* /i*] ■. a = a* A I < h, I* A h* < h A 

2 + 2l + 2h* < h + 3l* A l + 2h* < h + 2l* 

# Is the result a fixed point? 

(W2 compose U) subset W2; 

True 



Fig. 6. A trace of Omega Calculation of Recursive-call Invariant 



relation (such as U above), the transitive-closure operation aims to compute 
its least fixed point. A least fixed-point of U is defined as Vi>o (^*)> where 
= U, and 17®+^ = [T‘ compose U. A “shortcoming” in this operation is 
that it does not support generalization to give an approximate fixed point, if the 
least fixed point cannot be found. 

Generalized transitive closure. To overcome the limitation of Omega’s 
transitive-closure operation, we introduced in ^ the concept of generalized tran- 
sitive closure with selective generalization. Basically, it introduces generalizations 
of size relation based on selective grouping and hulling of its various disjuncts. 
While hulling aids in ensuring termination of fixed-point computation at the 
expense of accuracy, selective grouping and hulling help maintain accuracy of 
such computation. 

3.2 Context Synthesis for Recursive Functions 

For recursive functions, our analysis must derive the most informative contextual 
constraint that is applicable to all recursive invocations of the function, including 
the degenerated case. For a more accurate analysis, our method differentiates two 
closely-related contexts: (a) The context of a labelled call encountered during 
the first time the function call is invoked; ie., before any nested recursive call is 



Deriving Pre-conditions for Array Bound Check Elimination 



13 



invoked, (b) The context of a labelled call encountered after some invocations of 
nested recursive calls. The reason for this separation is because the latter context 
is computed using the synthesized recursive-call invariant. 

The contextual constraint of the first call is analyzed in the same way as that 
for non-recursive function. For each label L of a recursive function /, the context 
of the first call is: 



ctxFst{L) = ctx{L) A ctxSta{f) 

where ctx{L) is the derived contextual constraint at program point L, and 
ctxSta{f) denotes the default context that can be assumed at procedural en- 
try of /. For function look, ctxSta(look) = a > 0. (ie., the array must be of 
non-negative length.) 

The contextual constraint for a labelled call encountered after subsequent 
recursive invocations of /-calls can be computed using: 



ctxRec{L) = inverse{ctx{L)) A inv{f) A ctxSta{f) 

Note that we make use of the synthesized invariant of /, namely inv{f), while 
the inverse operation (as defined in the Omega Calculator) is used to obtain a 
mirror copy of ctx{L) that applies to the recursive sizes (instead of the initial 
sizes). 

Separate identification of contexts for both the first recursive call and sub- 
sequent recursive calls is instrumental to obtaining more accurate contextual 
constraints, which in turns induce more precise pre-condition for eliminating 
recursive checks. 

For the function look, the labels used are L4 and HA. The context enclosing 
the labelled call is found to be Z < h. Following the above procedure, we obtain 
the following contextual constraints: 



ctx{LA) 

ctxSta(look) 

inverse{ctx{LA)) 

inv{look) 

ctxFst{LA) 

ctxRec{LA) 



1 < h 
a > 0 
I* < h* 

a = a* A I < h, I* A h* < h A 

2 + 2l + 2h* < h + 3l* A l + 2h* < h + 2l* 

1 < h A a > 0 

a = a* Al< I* <h*<hA0<aA 

2 + 2l + 2h* < h + 3l* A l + 2h* < h + 2l* 



4 Pre-condition Derivation 

The synthesis of contexts and invariants is essentially a forward analysis that 
gathers information about how values are computed and propagated and how 
the conditions of if-branches are inherited. In contrast, the derivation of pre- 
condition for check elimination is inherently a backward problem. Here, the flow 
of information goes from callee to caller, with the goal of finding weakest possible 
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pre-condition which ensures that the operation can be performed safely without 
checking. 

We propose a backward method for deriving safety pre-conditions. This 
method considers each function in turn, starting from the lowest one in the 
calling hierarchy. Our method attempts to derive the required pre-condition to 
make each check redundant. Working backwards, each pre-condition that we de- 
rive from a callee would be converted into a check for its caller. In this way, 
we are able to derive the pre-condition for each check, including those that are 
nested arbitrarily deep inside procedural calls. The main steps are summarized 
here. 

- Determine each check to see if it is either unsafe, totally redundant or par- 
tially redundant. 

- Derive a safety pre-condition for each partially redundant check. Checks from 
recursive functions must take into account the recursive invocations. 

- Amalgamate related checks together. 

- To support inter-procedural propagation, convert each required pre-condition 
of a function into a check at its call site based on the parameter instantiation. 

To help describe our method, consider the following simple example: 

p{arr,i,j) = if 0 < i < j then L6@H6@sub{arr,i)+ 

L7@H7@sub{arr, * — 1) 

else — 1 

This function is a minor modification of newsub. It takes an array and two 
integers i and j, and returns the sum of elements at i and j — lifO < i < j, 
otherwise — 1 is returned. From the definition of this procedure, we can provide 
the following sized type for p: 

p :: (Arr"* Int, Int\ Int-^') ^ Int"' 

Size m > 0 A ((0 < i < j) V ((i < 0 V (f > j A f > 0)) A r = —1)) 



4.1 Check Classification 

We classify each check as either totally redundant, partially redundant or unsafe. 
Given a check chk(L) under a context ctx{L), we can capture the weakest pre- 
condition, pre{L) that enables chk(L) to become redundant. The weakest pre- 
condition is computed using: 

pre{L) = ~^ctx{L) V chk{L) 

This pre-condition should be simplifiec0 using the invariant context at proce- 
dure entry, namely ctxSta{p), whose validity would be verified by our sized- 
type system. If pre{L) = True, we classify the check as totally redundant. If 
pre{L) = False (or unknown due to the limitation of Presburger solver), we 

^ In Omega, the simplification can be done by a special operator, called gist. 
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classify the check as unsafe. Otherwise, the check is said to be partially redun- 
dant. 

Example : The four checks in p are: 

chk(L6) = i > 0 chk(H6) =i < m 

chk{L7) = i — \ > 0 chk{H7) =i — \ < m 

Of these four checks, only the pre-condition of check at L6, namely pre(L6) = 
^ctx{L6) V chk{L6) evaluates to True. Hence, chk{L6) is redundant, while the 
other three checks are partially redundant. In this example, we use the following 
contextual constraints: 

ctx{L6) = ctx{L7) = ctx{H6) = ctx{H7) 

ctx{L6) = ctxSta(p) A (0 < i < j) and ctxSta(p) = m > 0 



4.2 Derivation of Pre-condition 

The derivation of pre{L) is to a large extent dependent on ctx{L). A more infor- 
mative ctx{L) could lead to a better pre{L). For a given contextual constraint 
ctx{L), pre{L) can be computed by: 

pre{L) = ~^ctx{L) V chk{L) 

The following lemma characterizes pre{L) as the weakest pre-condition. We 
omit the proof in this paper. 

Lemma 1 The weakest pre-condition (pre ) for the safe elimination of a check 
(chk) in a given context (ctx) is pre = ~^ctx V chk. 

Example : Using the above formulae, we can derive the following pre-conditions 
for the three partially redundant checks: 

pre{H6) = {i < —1) V (j < « A 0 < «) V (« < m) 

pre{L7) = {i < —1) V (j < « A 0 < «) V (« > 1) 

pre{H7) = {i < —1) V (j < « A 0 < ^ V (« < m) 

Deriving pre-conditions for the elimination of checks from recursive pro- 
cedure is more challenging. A key problem is that the check may be executed 
repeatedly, and any derived pre-condition must ensure that the check is com- 
pletely eliminated. One well-known technique for the elimination of checks from 
loop-based program is the loop limit substitution method of Pj. Depending on 
the direction of monotonicity, the check of either the first or last iteration of 
the loop is used as a condition for the elimination of all checks. However, this 
method is restricted to checks on monotonic parameters whose limits can be 
precisely calculated. 

We propose a more general method to handle recursive checks. For better 
precision, our approach separates out the context of the initial recursive call 
from the context of the subsequent recursive calls. The latter context may use 
the invariant of recursive parameters from sized typing. 
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Using the recursive look function (whose parameters are non-monotonic) as 
an example, we shall provide two separate checks for the first and subsequent 
recursive calls, namely: 

chkFst(LA) = 0 < I + h and chkRec{L4:) = 0 < I* + h* 

with their respective contexts: 

ctxFst{L4) =a>0Al<h 

ctxRec{L4) = a>0Al*<h* A inv{look) 

inv{look) = a = a* A I < l*,h A h* < hA 

2 + 21 + 2H* < h + Sl* A l + 2h* < h + 2l* 

We next derive the pre-conditions for the two checks separately, as follows: 

preFst{L4) = ~^ctxFst{L4) V chkFst{L4) 

= {h <l) \J {Q < l + h) 
preRec{LA) = ~^ctxRec{LF) V chkRec{LA) 

= W F, h* . ^{a > 0 A I* < h* A I < h, I* A h* < h A 

2 + 2l + 2h* < h + 3l* A l + 2h* < h + 2l*) V {0 < I* + h*) 

= {h < 0 V (0 < I < h) V {I = -I A h ^ 0) 

Note that preRec is naturally expressed in terms of the recursive variables. 

However, we must re-express each pre-condition in terms of the initial variables. 
Hence, universal quantification was used to remove the recursive variables. 

We can now combine the two pre-conditions together in order to obtain a 
single safety pre-condition for the recursive check, as shown here: 

pre{L4) = preFst{LA) A preRec{LA) ={h < Z) V (0 < I + h A Q < 1) 

Through a similar derivation, the other check of H4, based on the pre- 
condition I + h < 2a from getMid, yields: 

pre{H4) = preFst{H4) A preRec{Fl4) ={h <l)V{h<aAl + h< 2a) 

The derived pre-conditions are very precise. Apart from ensuring that the 
given recursive checks are safe, it also captures a condition on how the checks 
may be avoided. 



4.3 Amalgamating Related Checks 

As some of the checks are closely related, it may be useful to amalgamate these 
checks together. At the risk of missing out some opportunities for optimization, 
the amalgamation of related checks serves two purposes, namely: 

® In general, any two checks can be amalgamated together. However, closely related 
checks will have a higher probability of being satisfied at the same time. This can 
help ensure amalgamation without loss of optimization. 
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- It can cut down the time taken for our analysis. 

- It can reduce the number of specialization points, and hence the size of the 
specialized code. 

We propose a simple technique to identify related checks. Given two checks C\ 
and C 2 , we consider them to be related if either C\ ^ C 2 or C 2 Gi. For 
example, checks Hd> and HI are related since chk{H6) ^ chk{H7). Because of 
this similarity, we can combine the pre-conditions of these two checks, as follows: 

pre{H6, H7) =pre{H6) A pre{H7) = z<— lV(j<zAO<z)Vz<m 

The combined pre-condition can eliminate both checks simultaneously. 



4.4 Inter-procedural Propagation of Checks 

To support inter-procedural propagation of checks, each pre-condition for a par- 
tially redundant check must first be converted into a new check at the call 
site. After that, the process of classifying the check and deriving its safety pre- 
condition is repeated. 

Consider two functions: 

f{vi,..,Vn) = ... L@sub{arr,i) . . . 

glwi,..,w„) = ... 

Suppose that in /, we have managed to derive a non-trivial pre{L) that would 
make chk{L) redundant. Then, at each call site of /, such as f{v [, .., v'„) in the 
body of function g, we should convert the pre-condition of / into a new check at 
the call site, as follows: 

chk{C) = 3X.pre{L) A suhs{C) 

subs{C) = A^^i (eqTiTl) where :: r*; u' :: r'; A = /u(ri) 

The pre-condition of / is converted into a check via a size parameter substi- 
tution, subs{C). 

Example : Consider a function q: 
q :: (Arr" Int, Int*) ^ Int® 

q{arr,k) = let r = random{); I = k + 1 in C8@C9@p{arr,r,l) 

At the labelled call site, we have: 

subs{C8) = subs{C9) and subs{C9) ={m = n) A {j = 1) A {i = r) 

We assume that the size variables assigned to the arguments of the p call are 
n, r and I, respectively. Using our formula for converting the pre-condition of p 
into a check at its call site, we obtain: 




18 Wei-Ngan Chin, Siau-Cheng Khoo, and Dana N. Xu 



chk{C8) = 3i,j.pre{L7) A subs{C8) 

= (r < -1) V (Z < r A 0 < r) V (r > 1) 
chk{C9) = 3 i,j . pre{H6, H7) A subs{C9) 

= (r < —1) V (^ < r A 0 < r) V (r < n) 

With this, we can propagate the check backwards across the procedure of q 
by deriving the following two pre-conditions. 

pre{C8) =y r,l. ^ {I = k + V) V ((r < —1) V {I < r A 0 < r) V (r > 1)) 

= k < -2 

pre{C9) =yr,l.^{l = k + l) V ((r < —1) V {I < r A 0 < r) V (r < n)) 
= (fc < -2) V (-1 < fc < n - 2) 



Note that since r and I are local variables; we must eliminate them from our 
pre-condition by using universal quantification. Universal quantification ensures 
that we get a new pre-condition that is safe for all values of r and 1. 

Inter-procedural propagation of checks applies to recursive functions without 
exception. 

Example : The pre-condition for look can be converted to checks at its call site 
in bsearch, as follows: 

chk{Lb) =3 l^h. pre{LA) A subs{L5) 

= 3 l,h. {{h <Z)V(0< I + h A 0 < 1)) A {I = 0 A h = v — 1) 

= (?; < 0) V (1 < u) 
chk{H5) =3 l,h. pre{HA) A subs{L5) 

= 3 l,h. {{h < 1) V {h < a A I + h < 2a)) A {1 = 0 A h = v — 1) 

= (v < 0) V (v < a, 2a) 
subs{L5) = {I = 0) A {h = V — 1) 

From here, we can derive the safety pre-conditions for bsearch as shown 
below. 

pre{L5) = = ctx{L5) V chk{L5) 

= Vu. ={v = aAa>0)\/{v<0\/l<v) 

= True 

pre{H5) = = ctx{H5) V chk{H5) 

= 'iv.={v = a A a > 0) \/ {v < 0 V V < a, 2a) 

= True 

Through this inter-procedural propagation, we have successfully determined 
that the recursive checks of look inside bsearch are totally redundant. Hence, all 
bound checks for bsearch can be completely eliminated. This is done by providing 
specialized versions of look and getmid (without bound checks) that would be 
called from bsearch. 




Deriving Pre-conditions for Array Bound Check Elimination 



19 



5 Bound Check Specialization 

With the derived pre-condition for each partially redundant check, we can now 
proceed to eliminate more bound checks by specializing each call site with re- 
spect to its context. The apparatus for bound check specialization is essentially 
the same as contextual specialization PHEI where each function call can be spe- 
cialized with respect to its context of use. A novelty of our method is the use of 
derived pre-condition to guide specialization. This approach is fully automatic 
and can give better reuse of specialized code. 

Suppose that we have a function / with N checks, that is used in one of its 
parent function g as follows: 

f{vi,..,vn) = t/Req{Pj"i 

Notation-wise, we write ..,v!^) as the short form for (7i@ ... 

Suppose further that we have a context ctx{C), for the labelled call, which 
may encompass a context that could be inherited from the specialization of g. 
Let the set of pre-conditions whose checks could be made redundant be: 

G = {Pi\i € 1 ... A A ctx{C) chk{Ci)} 

For maximal bound check optimization, we should specialize each of the call 
for / to a version that would maximize bound check elimination. In the above 
example, our specialization would introduce fa, as follows: 

g{vi,..,Vn) = ... ... 

/c(r'i, .., rin) = 5[t/] G where ctxSta(fG) = G A ctxSta{f) 

Note how the context G, which contains the maximum pre-conditions that 
are satisfiable in ctx{G), is propagated inside the body of / by specializer S. 
This specialization is commonly known as polyvariant specialization. It will gen- 
erate a specialized version of the code for each unique set of checks that can 
be eliminated. It can provide as many variants of the specialized codes as there 
are distinguishable contexts. To minimize the number of variants used, the spe- 
cialization process will proceed top-down from the main function, and generate 
a specialized version only if it is required directly (or indirectly) by the main 
function. Polyvariant specialization can help maximize the elimination of bound 
checks. However, there is a potential explosion of code size, as the maximum 
number of specialized variants for each function is 2^ where N is the number 
of partially redundant checks that exist. In practice, such code explosion seldom 
occur, unless the function is heavily reused under different contexts. 

If code size is a major issue (say for embedded systems), we could use either 
monovariant specialization or duovariant specialization. 

In monovariant specialization, we will need an analysis technique to help 
identify the best common context, call it ctxMin{f), that is satisfied by all the 
call sites. Let the set of call sites to / in a given program be: 
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and their corresponding contexts be We define the best common 

context of these call sites to be: 

ctxMin{f) = {Pi\i € l..N,Vj € 1..M .ctx(Cj) chk(Cij)} 

With this most informative common context, we could now provide a least 
specialized variant for / that could be used by all call sites in our program, as 
follows: 



fminivi, ..,V„) = S[tf] ctxMin{f) where 
ctxSta(fmin) = ctxMin{f) A ctxSta{f) 

For duovariant specialization, we shall generate a version of each function / 
that is maximally specialized, namely: 

fmax{vi,..,Vn) = 5[t/] ctxMax{f) where 
ctxSta{fmax) = ctxMax{f) A ctxSta{f) 

ctxMax{f) = { Pi\ i G G l..M.ctx{Cj) => chk{Cij)} 

This most specialized variant should be used whenever possible. With the 
three variants of bound check specialization, we now have a spread of the clas- 
sic space-time tradeoff. We hope to investigate the cost-effectiveness of these 
alternatives in the near future. 



6 Performance Analysis 

In this section, we address the practicality of using constraint solving for im- 
plementing both our forward analysis (essentially sized typing) and backward 
analysis (for deriving pre-conditions). 

Our experiments were performed with Omega Library 1.10, running on a 
Sun System 450. We took our examples mostly from with the exception of 
sumarray from The reported time measurements are the average values out 
of 50 runs. The first column reports the time taken by forward analysis (largely 
for computing invariants), while the second column reports the time taken for 
backward derivation of safety pre-condition. 

The results shows that the time taken by the analyses required by array 
bound checks optimization are largely acceptable. A slightly higher analysis time 
was reported for hanoi, due largely to the more complex recursive invariant being 
synthesized. 

Our analysis determines that all checks in these examples are totally re- 
dundant. Consequently, they are eliminated in the specialized codes. Gains in 
run-time efficiency range between 8% (for “sumarray” program) and 56% ( “ma- 
trix mult” program), which is comparable to those found in the literature (such 

as pnji. 
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bcopy 


0.03 
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binary search 


0.54 
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bubble sort 


0.05 


0.31 


dot product 


0.03 


0.21 


hanoi 


1.59 


2.74 


matrix mult 


0.12 


0.98 


queens 


0.19 


0.53 


sumarray 


0.03 


0.42 



Fig. 7. Computation Time (in Secs) for Forward and Backward Analyses 



7 Related Work 

Traditionally, data flow analysis techniques have been employed to gather avail- 
able information for the purpose of identifying redundant checks, and antici- 
patable information for the purpose of hoisting partially redundant checks to a 
more profitable location. The techniques employed have gradually increased in 
sophistication, from the use of family of checks in US], to the use of difference 
constraints in 0. While the efficiency of the techniques are not in question, 
data flow analysis techniques are inadequate for handling checks from recursive 
procedures, as deeper invariants are often required. 

To handle checks from programs with more complex control flow, verification- 
based methods have also been advocated by Suzuki and Ishihata m, Necula and 
Lee ^0| and Xu et al E3; whilst Cousot and Halbwachs 0 have advocated 
the use of abstract interpretation techniques. Whilst powerful, these methods 
have so far been restricted to eliminating totally redundant checks. 

It is interesting to note that the basic idea behind the backward derivation of 
weakest pre-condition was already present in the inductive iteration method, pi- 
oneered by Suzuki and Ishihata|25j and more recently improved by Xu et al m 
However, the primary focus has been on finding totally redundant checks. Due 
to this focus, the backward analysis technique proposed in |23 actually gathers 
both pre-condition and contextual constraints together. Apart from missing out 
on partially redundant checks, their approach is less accurate than forward meth- 
ods (such as |S|) since information on local variables are often lost in backward 
analysis. 

Xi and Pfenning have advocated the use of dependent types for array bound 
check elimination|2E|- While it is possible to specify pre-conditions through de- 
pendent types, they do not specially handle partially redundant checks. More- 
over, the onus for supplying suitable dependent types rest squarely on the pro- 
grammers. 

Recently, Rugina and Rinard m proposed an analysis method to synthe- 
size symbolic bounds for recursive functions. In their method, every variable is 
expressed in terms of a lower and an upper symbolic bound. By assuming a 
polynomial form for the symbolic bounds, their method is able to compute these 
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bounds without using fix-point iteration. In some sense, this inductive approach 
is similar to the proposal made in ElEI], where size information is inductively 
captured to analyze program termination property. Whilst the efficiency of the 
inductive approach is not in question, we have yet to investigate the loss in 
precision that come with fixing the expected target form. 



8 Conclusion and Future Work 

Through a novel combination of both forward technique to compute contex- 
tual constraint, and backward method to derive weakest pre-conditions, we 
now have a comprehensive method for handling both totally redundant and 
partially redundant checks. Both analysis methods are built on top of a Pres- 
burger constraint solver that has been shown to be both accurate and practically 
efficient m- Our new approach is noteworthy in its superior handling of partially 
redundant checks. 

There are several promising directions for future research. They deal largely 
with how the precision of optimization and efficiency of analysis method could 
be further improved. 

Firstly, our contextual constraint analysis presently inherits its constraints 
largely from conditional branches. We can further improve its accuracy by prop- 
agating prior bounds checks in accordance with the flow of control. For this 
to work properly, we must be able to identify the weakest pre-conditions for 
each function that could be asserted as post-condition, after each call has been 
successfully executed. As bound errors could be caught by exception handling, 
the extent of check propagation would be limited to the scope where the bound 
errors are uncaught. 

Secondly, not all partially redundant checks could be eliminated by its caller’s 
context. Under this scenario, it may be profitable to insert speculative tests 
that could capitalize on the possibility that safety pre-condition are present 
at runtime. Whilst the idea of inserting speculative runtime test is simple to 
implement, two important issues that need to be investigated are (i) what test 
to insert, and (ii) where and when will it be profitable to insert the selected test. 
Specifically, we may strip out the avoidance condition from the speculative test, 
and restricts such runtime testS to only recursive checks. 

Lastly, the efficiency of our method should be carefully investigated. The cost- 
benefit tradeoff of check amalgamation and bound check specialization would 
need to be carefully studied in order to come up with a practically useful strategy. 
Also, the sophistication (and cost) of our approach is affected by the type of 
constraints that is supported. Whilst Presburger formulae have been found to 
be both precise and efficient, it may still be useful to explore other types of 
constraint domains. 

The insertion of speculative tests may look similar to check hoisting. The key different 
is that no exception is raised if speculative test fails. 
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Abstract. We present a type-based specification for useless-variable 
elimination for a higher-order, call-by-value functional language. Uti- 
lizing a weak form of dependent types, we introduce a mechanism for 
eliminating at runtime useless code that is not detectable statically. We 
prove the specihcations sound and safe with respect to an operational 
semantics, ensuring that eliminated expressions contributed no observ- 
able behavior to the original program. We define an algorithm that im- 
plements useless-variable elimination without dependent types, and we 
prove this algorithm correct with respect to the specihcation. 



1 Introduction 

A variable can be considered useless if its value contributes nothing to the behav- 
ior of the program. Code can be considered useless if its computation and value 
(for code corresponding to expressions) contribute nothing to the behavior of a 
program. Useless variables and code arise from various program transformations 
(e.g., arity raising, partial evaluation), from the maintenance and evolution of 
software, and from programs extracted from proofs. Useless- variable elimination 
(UVE) and useless-code elimination (UCE) are the operations of removing use- 
less formal parameters (variables) and corresponding actual parameters (code) 
from a program. (More generally, useless code elimination can also include de- 
tecting and eliminating dead or unreachable code.) These optimizations reduce 
the number of arguments passed to a function, eliminate unneeded computa- 
tions, and potentially shorten the lifetime of some variables. All of these aspects 
contribute to the importance of UVE as an optimization of program representa- 
tion. 

We study UVE and UCE using an approach based on type inference in which 
types convey information about the use or need of variables and code. We demon- 
strate that by considering richer notions of types we can detect and eliminate 
more occurrences of useless items. We begin with a type system based on sim- 
ple types with a very simple form of subtyping. This specification can detect 
and eliminate many useless variables and expressions, but the type system also 
imposes too many constraints, prohibiting the identification of additional occur- 
rences of useless items. We then extend the types and type system to include 

* This work is supported in part by NSF Award #CCR-9900918. 



O. Danvy and A. Filinski (Eds.): PADO-II, LNCS 2053, pp. 25-ESI 2001. 
(c) Springer-Verlag Berlin Heidelberg 2001 



26 



Adam Fischbach and John Hannan 



a weak form of dependent types. Not only does this allow us to identify more 
useless variables and expressions, it also allows us to identify expressions that 
can be ignored at run time even though they cannot be eliminated statically. 
Our use of dependent types is closely related to the use of conjunctive types to 
perform UCE 0. 

Unlike flow-based approaches (BlEli type-based ones are relatively straight- 
forward and reasonably efficient. Except for the restriction to a typed language, 
our approach (and other typed-based ones) can detect and eliminate more use- 
less code than existing flow-based approaches. The correctness proofs for our 
approach (see 0) are significantly simpler than the proofs for flow-based ap- 
proaches. 

Unlike other type-based approaches to this problem 001 , we apply annota- 
tions to function arrows to indicate the use/need of a function’s parameter and 
corresponding arguments. These other approaches consider replacing the type of 
a useless variable or expression with a type that indicates uselessness. Consider 
the function \x : int.4. While other approaches use type inference to compute a 
type such as (unit int) 0 and (w'^^ ^ 0, we infer a type (int int). 

For many kinds of examples, these approaches are roughly equivalent in their ex- 
pressive power, but as we include richer type features, like dependent types, our 
approach appears to offer a simpler and more powerful framework for detecting 
more useless variables and code. 

The remainder of the paper is organized as follows. In the next section we in- 
troduce a simple, typed functional language that we use for both the source and 
target of our translation. The language contains annotations regarding useless 
variables and expressions. In Section 0 we define useless- variable (and useless- 
code) elimination via a type system that enforces constraints regarding use be- 
tween types and terms. We demonstrate the correctness of this specification with 
respect to the operational semantics. In Section 0 we consider an extension to 
our simple-type specification that includes dependent types. We motivate the 
need for improvements to our original specification and justify how this exten- 
sion increases the ability to detect more useless variables and code. In Section 0 
we present and prove correct an algorithm for UVE without dependent types. 
We conclude in Section 0 

2 Language Definition 

We introduce a simple, typed, higher-order functional language with recursive 
function definitions. To support the specification of useless code and useless 
variables, the language assumes a set of dummy variables, indexed by type, and 
two forms of application. 

e \: = n \ (T \ X \ Ax.e | e @n e | e @u e | ifz e e e | nf.Xx.e | e -|- e 

Each dummy variable is annotated with its simple type r. The annotations n 
and u indicate possibly needed and definitely unneeded operands, respectively. 
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The goal of UVE is to replace occurrences of ‘@n’ with ‘@u’ where it is safe to 
do so. 

We assume a call-by-value operational semantics for this language axioma- 
tized by a judgment of the form p > e u, in which p is an environment, e is 
an expression and u is a value. The rules are almost entirely standard except for 
application and recursion. For application, we have two rules corresponding to 
the two forms of annotations: 

p > ei ^ [p', Xx.e] p\> €2 ^ V2 p'{x i— > V2} > e ^ n 
p > (ei @n 62) ^ V 

p\> ei ^ [p' , Xx.e] p\> V 
p \> (ei @u 62) ^ V 

For handling recursion, we use substitution instead of environments: 



p > pf.Xx.e ^ [p, Xx.e[pf.Xx.e/f]] 

This formulation of operational semantics provides a convenient setting in which 
to reason about the correctness of our work. 

We can restrict our attention to a sublanguage of this language that does not 
contain dummy variables or applications annotated by u. The resulting language 
is a traditional, typed functional language. We drop the n annotation from appli- 
cations since it is superfluous in this sublanguage. We consider this sublanguage 
as the input language to our UVE analysis. 

The type system for our language is almost entirely standard, axiomatized 
by a judgment of the form F t> e : r. The types consist of simple types. The 
annotations on applications are ignored by the type system. We include a rule 
for typing dummy variables: 

Ft>d^ :t 



3 UVE with Simple Types 

We give a formal specification of the UVE transformation using simple types 
annotated with use information. We axiomatize a deductive system that relates 
a source term, its annotated type, and a target term (a UVE-form of the source 
term). The inference rules follow the structure of a traditional type system for 
simple types with additional constraints supporting UVE. 



3.1 UVE Types 

As in previous work we use annotations on types to provide a more de- 

tailed characterization of expressions: the type of an expression will convey in- 
formation regarding the use of variables in the expression. The inclusion of these 
annotations on inference rules in a type system introduces constraints which 
ensures that these need properties are valid. (This validity is provided by a type 



28 



Adam Fischbach and John Hannan 



consistency result.) For UVE, we use annotations corresponding to needed and 
unneeded (useless) variables: 



a = n I u 

The annotation a on a function type indicates the approximate need of the 
formal parameter of a function of that type. 

r :: = int I T r 

A function of type t\ T2 does not need its argument, e.g., Ax.l : r int, 
and a function of type T2 Tnay need its argument, e.g., Ax.a;+1 : int int. 

As should be expected, we must conservatively estimate the need of a variable, 
possibly indicating that a variable is needed when, in fact, it may not be. To 
increase the precision of this approximation we define an ordering on types 

Definition 1. Let a < a' and t < t' he defined by the following rules: 

r( < Ti T2 < T2 a < a' 
u < n a < a int < int (ri T2) < (r{ T2) 

Intuitively, if r < r' and we have an expression e of type r, then we can also 
consider (use) e as an expression of type t' . 

3.2 Specification of UVE 

With these types defined we can introduce useless- variable elimination as a type- 
based deductive system. We introduce the judgment T > e : r e' in which F 
is a type context, e is an expression in our input language, e' is an expression 
with all detected useless code eliminated, and r is the annotated type of e'. The 
judgment is axiomatized by the rules in Figure Q We assume the expression e is 
well-typed (in the language’s type system). 

We have two rules each for A-abstractions and applications: one for the elimi- 
nation case (eliminating a formal parameter or argument) and one for the default 
case (no elimination). In the first rule for abstraction (eliminating a parameter) 
we use the condition that x not occur free in the translation of the function’s 
body to determine that the parameter is useless. The second rule for abstractions 
is straightforward. 

The two rules for handling application use type information to ensure that 
the annotation on applications is consistent with the type of the operator. A 
type consistency result will ensure that if ei has type T2 then its value 

will be a function of that type, i.e., a function that does not need its argument. 
The operand of an application annotated by u must have no effect, otherwise 
eliminating it might change the behavior of the program. In our language, the 
only expressions that generate effects are addition (due to the possibility of 
overflow) and recursion (due to the possibility of nontermination). We assume 
a precomputed effect analysis on programs. We refrain from presenting it here 



Type Systems for Useless- Variable Elimination 



29 



r{x) = T 

_Tl>n:int=>n F \> x : t ^ x 



F{x Ti} \> e ■. T2 ^ e' X ^ FV(e') F{x : ti} \> e : T2 ^ e' 



r 


> A*. 6 : n 


— "62 => Xx.e' 


F \> Xx.e : 


Ti T2 => Xx.e' 


r{f 


: r} > Ax. 6 


: T => Xx.e' F 


' > 6l : T2 


r ^ 


■ e'l 


noeffect(62) 


r > nf.Xx.e : T 


=> nf.Xx.e' 


r > (ei @ 62 


) : T 


(e'l @u d"") 




r > 6i : 


T2 — >n T ^ e'l 


T > 62 : T2 => 62 


T2 


< T2 






r > (ei @ 62) 


: r => (e'l @n 


e'2) 






r> ei : 


int => e'l 


T > 62 : T2 ^ 62 F \> 63 : T 3 => 


63 


T2 < T T 3 < T 



r \> (ifz ei 62 63) : r => (ifz e'l 62 63) 

r > 6i : int => e'l T > 62 : int => 62 
r > (61-1-62) : int => (e'l -1-62) 

Fig. 1 . UVE with Subtypes 



or including it as part of the UVE analysis, as this approach only serves to 
complicate the presentation of useless-variable elimination while contributing 
very little to the understanding of useless variables. 

As an example of the capabilities of our specification consider the following 
term in which N and M are some effect-free terms. (The use of let is only for 
improved clarity.) 

let fl = Ax . 3 

f2 = Ax.x-fl 

g = Ak. (k @ N) 

h = Ak. (k @ N) 

in (g @ fl) -|- (g @ f 2 ) + (fl @ M) + (h @ fl) end 
Our specification produces the following expression: 

let fl = Ax . 3 

f2 = Ax.x-fl 

g = Ak. (k @n N) 

h = Ak. (k @u d"^^) 

in (g @n fl) + (g @n f2) + (fl @u d'*^^) + (h @p fl) end 

Even though fl and f 2 are both passed as arguments to g, they do not need 
to have identical types. They only need to have a common subtype. But the 
application (k @ N) in the body of g needs its argument because f 2 needs its 
argument. The application (k @ N) in the body of h does not need its argument 
because fl does not need its argument. 
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3.3 Correctness 

We consider two aspects of correctness. First, because we consider a typed lan- 
guage, we ensure that the results of the UVE transformation are always well- 
typed programs. This result trivially follows from the structure of our specifica- 
tion because it enforces the traditional simple-type rules and the transformation 
only replaces variables and expressions with other variables and expressions of 
the same type. Second, we ensure that the transformation preserves the behavior 
of programs. Because we represent both the language’s definition (type system 
and operational semantics) and the UVE analysis/transformation as deductive 
systems, relating these is straightforward and requires few auxiliary results. 

For the operational correctness of our analysis and transformation, we want 
to ensure that a source program and its transformed (UVE-eliminated) form 
have the same behavior. This means that one must evaluate to a value iff the 
other also evaluates to that value (at a base type). Our theorem incorporates 
such an equivalence between terms and also a type consistency result. The latter 
is actually crucial to proving the former. 

To express a statement of operational equivalence we need to define a relation 
between values and a relation among environments and contexts: 

Definition 2. The relations v : t ^ v' and p : T ^ p' are the least relations 
satisfying the following eonditions: 

1. n : int^ n (for all integers n); 

2. [p, Ax.e] : r => \p' ,e'] if there exists t' and T such that p : T ^ p' , t' < t, 
and r \> Xx.e : r' => e' is derivable; 

3. p ■. r ^ p' if dom{p) = dom{r), dom{p') C dom{p) and for all x G dom{p'), 
p{x) : r{x) pfx). 

In the second case we use e' for the term in the closure corresponding to v' in 
anticipation of developments in the next section. For the purposes of the current 
section, we could have used Xx.e' instead. The justification for the condition 
r' < T in the second case is that we might have an expression like (k @n N) 
in which k has type int int but is bound to a value (f 1) of type int int 
(because, for example, the application rule allows a function Xx.e of type t\ 

T 2 to be applied to an argument e' of some type r( such that < ri) Note in the 
last case we only have dom(p') C dom(p). This inequality reflects the eliminated 
useless variables in the translated term. 

Theorem 1. If T \> e ■. t => m, p ■. T => p' , and FV{m) C dom(p') then 

1. if p\> e ^ V then there exists a v' such that p' \> m ^ v' and v : t ^ v' ; 

2. if p' \> in ^ v' then there exists a v such that pt> e ^ v and v : t ^ v' . 

Part (0 expresses the soundness of our analysis and part O) expresses its 
safety. The proof of both parts can be found in 0 . 

As a special case of the theorem, we consider closed terms at base type: 

Corollary 1. If ■ > e : int ^ m then ■\>e^niff-\>m^n. 
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4 UVE with Dependent Function Types 

While simple types with subtyping detects and eliminates many instances of 
useless variables and code, the typing constraints imposed by that system can 
still force a function to have some type r due to its context, but when considered 
by itself it has a type strictly less than r. 



4.1 Motivating the Need for Dependent Types 

Consider the following example P|: 

let f = Ah.Az.Ch @ z) 

in (f @ (Ax. 3) @ N) + (f @ (Ax.x) @ 7) end 

in which function f has type (int ^ int) int ^ int, and variables x, y, z, and 
h have type int, int, int, and (int — > int), respectively. 

Because f is bound to a term that applies its first argument to its second, the 
term N is useless code. (We again assume N to be an effect-free term.) To identify 
this, the first occurrence of f should have type (int — >u int) int int, while 
the second occurrence of f must have type (int int) — >n int — >n int. Each 
of these is a type that can be assigned to the term (Ah.Az. (h @ z)) when 
considered by itself. But in the context of the example above, the term can only 
be assigned the latter type since (Ax.x) has type int int, which is not a 
subtype of int int. So in the system described in Figure Ewe would be forced 
to identify N as needed. 

The two types of f given above are, however, both instances of a more general 
type involving annotation variables: (int int) — i-n int int. This observa- 
tion might lead one to pursue a notion of polymorphism for annotations, leading 
to a type such as V 7 .(int — int) int — int. However, the translated body 
of f would be (h @.y z) and the type system would not allow us to assign a 
meaning to the annotation. We observe that the need of the operand depends on 
the input to the function. This leads us to consider a form of dependent types 
involving the quantification of annotation variables that allows us to explicitly 
pass annotations to functions. Alternatively, we could view this approach as a 
form of explicit polymorphism in which the quantification is only over annotation 
variables (not type variables). 

This approach is in contrast to the use of conjunctive types by Damiani |2|, 
though the ideas are similar. In [Zj, Kobayashi uses ML-style let polymorphism 
to handle the above example since he does not make use of annotations but 
rather assigns the type unit to useless expressions. 

To handle dependent types we need to extend the definition of annotations, 
types, terms, and operational semantics. We also distinguish between two kinds 
of useless code: static and dynamic. Static useless code is what we have previously 
identified as useless code: code that can statically be determined to be useless. 
Dynamic useless code is code that can be determined useless at runtime and 
therefore an implementation can avoid executing it. Some dynamically useless 
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code might always be useless or it might sometimes be useful. In either case, we 
cannot statically determine that it is always useless and so we cannot eliminate 
it from the translated term. 

4.2 A Language with Dependent Types 

We extend the definition of annotations and types to include annotation variables 
(ranged over by 7) and dependent types that abstract annotation variables: 

a :: = u I n I 7 
r :: = int I T t \ 

We also extend the definition of the language to support annotation-variable 
abstraction and application: 

e :: = • • • | Xj.e | e @ a 

Along with this extension of the syntax we give new inference rules for the static 
and dynamic semantics of the new features: 

r t> e : T 7 ^ FV(T') r t> e : U^.t 

r > X^.e : 777. r 7 ^ > e @ a : r[a/7] 



p\> e'^ [p' , A7.e'] p' \> e'[a/7] ^ v 
p \> Xj-e ^ [p, A7.e] pl> {e@ a) ^ v 

We use substitution, instead of bindings in an environment, for handling 
mapping annotation variables to values to maintain a simple environment struc- 
ture. 

Using this extended language we can modify the example above to incorpo- 
rate explicit annotation abstraction and application: 

let f = Ay.Ah.Az.Ch @.y z) 

in (f @ u @n (Ax. 3 ) @u + (f @ n @n (Ax.x) @p 7 ) end 

in which f has type 777.(int int) — >n int int and we have eliminated N. 

4.3 UVE Specification with Dependent Types 

We extend our specification of UVE by introducing two new rules: 

r > Xx.e : T ^ m 7 ^ FV(T^) 7 ^ > e : II^.t m 

r > Xx.e : 7T7.T A7.7TI 7 ^ > e : r[a/7] ^ m@ a 

The first rule restricts the abstraction of annotation to function expressions. 
This is the only place in which they are needed. The condition 7 ^ FV(T^) is 
the expected one, ensuring that we are binding all occurrences of 7. With these 
rules added to our specification we can construct a deduction describing the 
translation given above. 
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4.4 Correctness 

To extend the correctness arguments from the previous section we must first 
extend the definition of ordering on types to include the following rule: 

T < t' 

n^.T < ily.r' 

The statement of Theorem □ remains the same for stating correctness of UVE 
with dependent types. The proof can be found in 

4.5 Further Examples 

Dependent types also prove useful for handling certain recursive functions. Con- 
sider the following example P] (using a more ML-like syntax for readability) : 

fun f g X y z = ifz z then (g @ x) 

else f @ (Ay. 3) @ y @ x @ (z- 1 ) 

Assume that f is used in a context such as 

f @ (Av.v) @ Q_1 @ Q_2 @ Q_3 

in which Q 2 is effect free. Without using dependent types, our system can only 
translate the definition of f to 

fun f g X y z = ifz z then (g @p x) 

else f @n (Ay. 3) @n y @n x @p (z“D 

of type (int int) — >n int — >n int — >n int — >n int, identifying nothing as 
unneeded. Using dependent types, however, we can translate the function to 

fun f 7 g X y z = ifz z then (g @.y x) 

else f @ u @n (Ay. 3) @u y @u x @p (z- 1 ) 

In this version, we identify the recursive call to f as having two useless arguments 
(the actual parameters y and x) , while identifying the type of its first parameter 
to be (int int). 

In addition to statically identifying useless code, our system with dependent 
types supports a form of dynamic identification of useless code. While this code 
cannot be eliminated statically, it can be ignored at runtime. Consider the ex- 
ample from Section 13 again. That example requires two evaluations of the term 
N, even though one of them is useless (as an argument to f 1). Using UVE with 
dependent types we can transform that example into the following: 

let fl = Ax. 3 
f 2 = Ax.x-f -1 
g = A 7 . Ak. (k N) 
h = Ak . (k @u 

in (g @ u @n fl) -I- (g @ n @n f2) + (fl @u d'*^^) -b 
(h @n fl) end 
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Now the evaluation of the expression (g @ u @p f 1 ) reduces to the expression 
((Ax. 3) @u N) and we avoid the useless evaluation of N in this case. Note 
that we cannot statically eliminate N because the expression (g @ n @n f2) 
requires the evaluation of N. This is an example of useless- variable elimination 
that previous systems could not detect. 

Another use of dependent types arises with operands that have an effect. In 
our specifications without dependent types, an argument to a function may be 
needed only for its effect. We say that such arguments are relevant^ suggesting 
that their computations are relevant to the behavior of an expression, but not 
their values. Consider the example 

let f = /rf . Ax . Ay . ifz x then y else f @ (x+1) @ (y+y) 
g z = 3 

in (g @ (f @ a @ b) ) + (g @ N) end 

in which N is an expensive, effect-free expression. Because f is a recursive func- 
tion, a reasonable effect analysis will conclude that the expression (f @ a @ b) 
is not effect free (due to the possibility of non-termination). Hence, we cannot 
consider it a useless argument. It is a relevant argument. Without dependent 
types, our system considers g as a function that needs its argument, and hence 
cannot eliminate N: 

let f = /rf . Ax . Ay . ifz x then y else f @n (x+1) @p (y+y) 
g z = 3 

in (g @p (f @p a @p b)) + (g @p N) end 

With dependent types, we can isolate g’s need for its argument just to the case 
in which the argument has an effect: 

let f = /rf . Ax . Ay . ifz x then y else f @p (x+1) @p (y+y) 
g 7 z = 3 

in (g @ n @p (f @p a @p b)) + (g @ u @u d) end 

The definition of g here looks a bit strange as we have introduced an apparently 
useless parameter 7. However, the type of g is ily.int — >.y int. 



5 A UVE Algorithm 

We have developed and proved correct an algorithm that detects useless code. 
The algorithm is partitioned into three stages: type inference, constraint genera- 
tion and constraint solving, defined by the functions T, U, and solve respectively. 

The first stage is essentially traditional type inference for the simply-typed 
functional language introduced in Section O and so we do not explicitly define it 
here. Of particular interest, however, is the observation that all function types 
are annotated. Since no meaning can be attached to these annotations until 
the constraints are solved, type inference can treat the annotations as variables. 
The function T(e) returns a term m 7 of type r in which each subterm of m is 
explicitly decorated with its inferred type. 
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The function U, defined in Figure^ takes a term rn^ and returns a set O of 
term variables (which is a subset of variables free in a set of constraints on 
annotations and a term e' , which is a form of mT' with all applications annotated 
and all type decorations removed. 

Detecting useless variables hinges on the constraint x ^ FV(e') imposed by 
the abstraction rule in Figure ^ The set FV(e') is the set of free variables in the 
translated term e'. In other words, FV(e') is the set of free variables not detected 
as useless in the original term. The lA function computes a representation of this 
set of free variables as the set 0. 

In the application rule in Figure |3, the sets 0\ and 02 represent the free 
variables not detected as useless in the operator and operand, respectively. In 
computing the corresponding set for the application, we must union 6>i and 
02, provided that the operand is needed. If, on the other hand, the operand is 
useless, then 02 should not be included in the union. 

Since we have no way of determining the usefulness of the operand at this 
stage of the algorithm, we delay the resolution of the issue by computing the 
conditional union of 0\ and 02. This is done by annotating each term variable 
in 02 with the annotation variable 7 (represented in U by the operation y02). 
Since several union operations may take place while examining a term, a term 
variable may be annotated with several different annotation variables. In fact, 
there may be several occurrences of the same term variable (each with different 
annotations) within a set. Consider the following example: 

^ @ [(jint^,,int @ ^int) (^int^,3int @ ^int)j 

The set 0 for E is {/i®, 

If the operand has a side effect, then it must be needed. To enforce this, the 
constraint (7 = n) is added to the constraint set. The function getord returns a 
set of constraints that enforces the ordering on annotation variables as defined 
by the subtype relation in Section 0 and is defined as follows: 

getord(int, int) = {} 

getord(ri T2 ,t( T2) = {(71 < 72)®} U getord(r(, n) U getord(r2, T;^) 

In the application case, the constraint set ^3 is conditionally unioned in the 
same way as the variable set 02. This is justified by the observation that, if the 
operand is useless, then it should not impose any constraints on the meaning of 
annotation variables. 

In the abstraction rule in Figure El the set of free variables not detected as 
useless is represented by the set 0 with all occurrences of x removed (represented 
as the operation 0\x) . The abstraction rule introduces the constraint {x G 0 D 
7 = n). Any variable in 0 is considered needed while all others are useless. This 
corresponds to the constraint x ^ FV(e') in Figure H 

The variable rule in Figure El simply adds an occurrence of x to the set of 
free variables. The other rules are straightforward. 

The third stage of the algorithm solves the constraints generated during the 
second stage. The function solve, defined in Figure 0 takes a set of constraints 
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U{x-) = {{x'^},{},x) 

= ({},{}, n) 

let( 0 ,<^,e') =W(e"=*) 
in {0\x, {{x £037 = n)®} U 
Xx.e') 

@ejy^) = 

let( 0 i,<^i,ei) =U{ey^-'"y 

{02,$2,e'2)^U{ely 

^3 = ^2 U getord(r^ , n) 

^4 = if noeffect(e 2 ) then {} 
else {(7 = n)®} 

in (01 U 702 , ^1 U 7^3 U ^ 4 , e'l @j e' 2 ) 



K{{f,r-eT) = 

let( 0 ,<?,e') = W(e'") 
in ( 0 \/,<?,M/-e') 

W((ifz er ey eyr) = 

let(0i,<?i,e'i) =W(eD 

( 02 ,<^ 2 ,e^)=W(e?) 

{03,<p3,e's) =U{ey) 

^4 = getord(r2, r) U getord(r3, r) 
in (01 U 02 U 03 , U ^2 U ^3 U ^ 4 , 
ifz e'l 62 63 ) 

W((eint + ^ 

let( 0 i,<?i,e'i) =W(er) 
(02,<Z^2,e^)=W(eL''^) 
in (01 U 02 ,<^i U ^ 2 ,e'i + 62 ) 



Fig. 2. The U function 



and returns a substitution mapping annotation variables to annotations. All 
variables in 0 and all constraints in <P are annotated with sets A of annotation 
variables. These annotations represent an element’s conditional membership in 
the set. If all of an element’s annotation variables are bound to n, then that 
element is considered to be in the set. 

Rules 1, 2, 4 and 7 in Figure 0 reduce these sets of annotations by removing 
annotation variables that have been bound to n. The remaining rules solve the 
constraints. Notice that a constraint’s annotation set must be empty before it 
can be solved. When none of the rules 1-10 can be applied, solve simply returns 
the substitution <5y mapping all annotation variables to u. In this way, solve 
determines which annotation variables must be n and binds all others to u. 

The complete UVE algorithm is defined as follows: 

UVE(e) = let = T(e) 

{0, e') = U{m7) 

6 = solve(^) 
in Sy 

When it is not safe to make assumptions concerning the type of e, all annotation 
variables in r (as well as annotation variables in the types of all free variables 
in e) should be bound to n. To further conform to the specification in Section 0 
a trivial translation translate can be defined which replaces all occurrences of 
Cl @u 62 in e' with ei @u d where d is a variable of the same type as 62 . 

To prove the correctness of the algorithm, it suffices to show that the algo- 
rithm is sound with respect to the specification in Section 0 Before stating the 
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1 . solve((a; £©37 = \J$) — solve((a; G 0 D j — n)^ U < 1 >) 

2 . solve((a; £ {2;^'^^'-'^} U © D 7 = n)® U = solve((a: £ {x‘^} U © 3 7 = n)® U 

3 . solve((a; £ {2;®} U © 3 7 = n)® U = solve({7 = n)® U $) 

4 . solve((7 = U = solve((7 = n)"^ U <P) 

5 . solve((7 = n)® U = solve(^[n/7]) o {7 n} 

6. solve({n = n)® U = solve(^) 

7 . solve((7i < 72)’*^'^^'^^ L) <P) = solve((7i < 72)“^ U <P) 

8. solve((n < 7)® U = solve(^[n/7]) o {7 1— > n} 

9 . solve({7 < n)® U ^) = solve(^) 

10 . solve((n < n)® U< 1 >) = solve(^) 

11. otherwise, so\\/e{<P) — Su 



Fig. 3. Solving constraints 



theorem, we require a few auxiliary lemmas. The first two state the correctness 
of the constraint solver. 

Lemma 1. Given a finite set of eonstraints <1>, solve(fP) always terminates. 

Lemma 2. If solve{<P) = 5 for any constraint set (p, then constraint c is satisfied 
for all c £ 5<P. 

The third lemma guarantees that the constraint set generated by the function 
getord enforces the type ordering defined in Section 0 

Lemma 3. If getord{ri , T 2 ) C <P' , and solve{<P') = 6 for any types t\ and 

T 2 , and constraint set I>' , then 6 ti < 6 t 2 - 

We also require the following two definitions: 

Definition 3. The term \ml\ is the term m with all type decorations removed. 

Definition 4. The function simplify{&) returns the set of variables Xi such that 
xf G O and A contains no occurrences of u. 

The following theorem states the correctness of the UVE algorithm by prov- 
ing the algorithm’s soundness with respect to the type system in Section 0 

Theorem 2. IfU{m'^) = (0,<P,e), <P C <P' , solvefiP') = 5, and translate{Se) = e' 
for any well-typed, annotated term ml and constraint set , then 

1. FV(e') = simplify{SO) 

2. r \> \m^\ : Sr => e! , for all T such that dom{r) A FV{rrG) and F{x) = St' 
for all x''' free in to’’. 
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Part ^of the theorem states that the algorithm computes the set of free variables 
in the translated term, which is required in Part |3 to prove that the algorithm 
imposes the same constraints on annotations as the inference system in Sectional 
We make use of Lemma 0 to assume that we have a solution to the set from 
which we can assume that we have a solution to the subset <P. 

The proofs of both parts of the theorem as well as the auxiliary lemmas can 
be found in 

6 Conclusions 

We have presented a specification of useless- variable and useless-code elimination 
for a higher-order, call-by-value language and shown it to be both sound and 
safe. By using dependent types we can identify code that, while not statically 
eliminated, can be ignored in some instances at run time. We have presented a 
simple algorithm based on our specification without dependent types and shown 
the algorithm to be sound with respect to our specification. We are working on 
an algorithm for UVE with dependent types that introduces a minimal number 
of annotation- variable abstractions. 
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Abstract. To achieve acceptable accuracy, many program analyses for 
functional programs are “property polymorphic”. That is, they can infer 
different input-output relations for a function at separate applications of 
the function, in a manner similar to type inference for a polymorphic lan- 
guage. We extend a property polymorphic (or “polyvariant” ) method for 
binding-time analysis, due to Dussart, Henglein, and Mossin, so that it 
applies to languages with ML-style type polymorphism. The extension is 
non-trivial and we have implemented it for Haskell. While we follow oth- 
ers in specifying the analysis as a non-standard type inference, we argue 
that it should be realised through a translation into the well-understood 
domain of Boolean constraints. The expressiveness offered by Boolean 
constraints opens the way for smooth extensions to sophisticated lan- 
guage features and it allows for more accurate analysis. 



1 Introduction 

The aim of this paper is to assimilate sophisticated program analysis capabilities 
in the context of a modern functional programming language with structured 
data and ML-style polymorphism. The most important capability of interest is 
“property polymorphism”, that is, an analyser’s ability to infer different proper- 
ties for a definition / at separate uses of /, in a manner similar to type inference 
for a polymorphic language. For many program analyses, we need property poly- 
morphism to achieve acceptable accuracy in the analysis. A second feature that 
we want is modularity of analysis, that is, the ability to produce analysis results 
that are context-independent, so as to support separate compilation. 

“Property polymorphism” has previously been studied for a variety of pro- 
gram analyses for higher-order functional programs. However, usually, the under- 
lying language misses features such as algebraic types and polymorphism. The 
assumption is usually that an extension, for example to ML-style polymorphism, 
is straightforward. Recent work, however, suggests that this is not necessarily so 
pi bl2 /j . One aim of this paper is to show that Boolean constraints give a better 
handle on such extensions. 

In this work we take binding-time analysis as our example. This analysis 
is used for program specialisation. The purpose of binding-time analysis is to 
identify program expressions that can be evaluated at specialisation time, based 
on information about which parts of the program’s input will be available at 
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that time. Values that are known at specialisation-time are called static, while 
values that will only be known later are called dynamic. 

Our interest in binding-time analysis is mainly as an example of a non-trivial 
program analysis in functional programming. More generally we are interested 
in program analysis expressed as inference of “constrained types” . In Section 0 
we discuss related analyses to which our methodology also applies. 

In the context of binding-time analysis, “binding-time polymorphism” is of- 
ten referred to as polyvariance. Our analysis is polyvariant and extends the anal- 
ysis proposed by Dussart, Henglein and Mossin | 7 ] to polymorphically typed 
programs. To investigate our method’s practicality, we have implemented our 
binding-time analysis for the Glasgow Haskell Compiler (GHC). This assigns 
binding-time properties to all expressions, but we do not have a specialiser. 

Much work on type-based analysis of functional programs introduces new 
constraint systems to define properties of interest, proceeding to explain how 
constraint solving or simplification is done for the new system. In many cases, to 
obtain accurate analysis, it appears necessary to employ non-standard versions 
of esoteric type systems such as intersection types. 

We deviate from this path by deliberately seeking to utilise well-known con- 
straint domains of great expressiveness. There are advantages of using well- 
known constraint domains: 

— Useful known theorems can be used to simplify the presentation or improve 
the algorithms. For example, much is known about the complexity of con- 
straint solving for various fragments of propositional logic. 

— Extra expressiveness may assist extensions of the analysis to new language 
features. In Section 0 we argue that the extension to ML-style polymor- 
phism is facilitated by the constraint view, and claim that this view helps 
us navigate the design space for the analysis. 

— Extra expressiveness may allow for more accurate analysis. This is borne out 
by experience from other other programming language paradigms |2]. 

— An implementation may utilise efficient representations and algorithms. (For 
this paper, we have experimented with two different Boolean solvers.) 

In the binding-time analysis literature we find many examples of monomorphic 
analysis for a polymorphic language, for example Mogensen’s analysis uni , as 
well as polyvariant analysis for a monomorphic language • The situation is sim- 
ilar for other analyses; for example, recent progress in usage analysis deals with 
either usage-monomorphic analysis for a polymorphic language m or usage- 
polymorphic analysis for a monomorphic language PB|. To our knowledge, this 
is the first time polyvariant binding-time analysis has been developed and im- 
plemented for a polymorphic language. Also, to the best of our knowledge, we 
present the first formalisation of the underlying binding-time logic. Owing to 
limited space, we concentrate on the core of our analysis and its novel aspects. 
For example, we leave out the treatment of structured data, focusing on basic 
types and function types. 

The next section introduces basic concepts concerning types and constraints. 
Section 0 introduces the binding-time constraint system and logic. Section 0 
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shows how to translate binding-time constraints to Boolean constraints. In Sec- 
tion 0 we extend the binding-time inference system of Dussart, Henglein and 
Mossin to polymorphically typed programs. The use of Boolean constraints pro- 
vides us with a large design space. In Section]^ we describe alternative meth- 
ods for supporting polymorphic function application. In Section Q we discuss a 
method for finding fixed points by adding constraints. Section discusses an 
implementation and concludes. 

2 Preliminaries 

2.1 The Underlying Type System 

Programs are assumed to be well-typed in an underlying type system which we 
will refer to as UL. We consider an ML-style let-polymorphic typed language 
with base types Int and Bool: 

Types t ::= a \ Int | Bool \ t —f t 

Type Schemes a ::= t \ \/a.t 

Expressions e ::= x \ Xx.e | e e | let a: = e in e | xjl t 

Expressions also include numerals and Boolean values True and False. We use 
vector notation for sequences. For example, a represents a sequence oi, . . . a„ of 
type variables. Types in UL are referred to as underlying types. 

We assume we 11- typed expressions to be explicitly typed. An example well- 
formed, typed expression is 

let f = Xx.x :: Va.a ^ a in ((f (t Int) 1 :: Int, (f j) Bool) True :: Bool) 

We find it helpful to have polymorphic application (denoted by j)) explicit in 
the language, but we let polymorphic abstraction be implicit (restricted to let- 
statements), to avoid cluttering expressions. 

2.2 Binding-Time Types 

In line with Dussart, Henglein and Mossin, we impose a “binding-time type” 

s 

structure on top of the underlying type structure. For instance, S ^ D describes 
a static function that takes a static value of base type as an argument and returns 
a dynamic value of base type. The structure of binding-time types reflects the 
structure of the underlying type system. 

Annotations b ::= S \ S \ D 

Binding-Time Types r ::= P \ b \ t t 

Binding-Time Type Schemes rj ::= r | yp,6.C => t 

Note that we distinguish between annotation variables S (which may only be in- 
stantiated to S or D) and binding-time type variables /3 (which may be instanti- 
ated to any r, including S) . In practice it is not necessary for an implementation 
to distinguish the two, but for clarity we will do so in this presentation. We write 
P-,6 = fv{rj) to refer to the free binding-time and annotation variables in rj. We 
now describe the constraint component C in binding-time type schemes rj. 
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(Sta) C h (S' <a b) (Dyn) C h (fe <„ D) 



(Hyp) Cl, (6i <a &2),C2 h (6i <„ 62) 



(Refl) C {b<a b) (Trans) 



C h (fci <. 62) C h {b 2 <a bs) 
C h (61 <„ 63) 



C h (63 </ ri) C h wft{Ti) 
(BaSu,) C h wft{b) (Arrow™) C h (63 </ t 2) C h wft{T2) 

C h wft{Tl T2) 




(Arrow/) 



C h (for <„ 62) 



C h (61 </ n eS’ rs) 




C h (62 <a fos) 

(Arrows) C h (t4 <s n) C h (ts <s re) 
C h (n eS. T3 <s T4 re) 



Fig. 1. Binding-Time Constraint Rules 



2.3 Binding-Time Constraints 

Relationships among binding-time variables are captured through constraints. 
Primitive constraints are of the form {x < y), read as “y is at least as dynamic 
as a:”. There are various sorts: an ordering on annotations (• <a -)j ^ structural 
ordering (• <g •) on binding-time types, and an auxiliary ordering (• </ •) 
described below. 

In addition, constraints can be formed using conjunction and existential 
quantification. More precisely, the constraint language is given by the grammar 



For binding-time types it is necessary to add an additional kind of constraint, 
that binding-time types are “well- formed” . If the top-most annotation of a 
binding-time type is dynamic then we know nothing of its components and so 
they must all be dynamic too. For instance, S S is meaningless. The rules for 
constraints wft(-) and (• </ •) ensure that binding-time types are well-formed. 
Relation (61 <f t) ensures that 61 is smaller than the top-most annotation of 
r. For monomorphic underlying types, all constraints are reducible to (• <a ■) 
constraints. 

Figure E defines the primitive fragment of the binding-time constraint sys- 
tem, BTC. This is identical to the rules presented by Dussart, Henglein and 



C ::= (r < r) I wftir) | C A C | 3/3, S.C 
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Mossin. We differ, however, by explicitly allowing for existential quantification 
over binding-time type variables in constraints. The rules governing the existen- 
tial quantification are taken from cylindric constraint systems |‘21| . That is, 

1. C h 3^~5.C 

2. if Cl h C 2 then 3/3, Cl h_3/3,<5.C2_ _ 

3. 3^, 5^(Ci_A 3_/3, 5 .C 2 ) = 3_/3, 3/3, <J.C2 

4. 3/3i, (5 i. 3/32, (52.C = 3/32, <52.3/3i, <5i.C 

The purpose of the cylindrification operation 3/3,5 is to hide (or discard) the 
variables /3, 5. During inference of binding-time types we may introduce inter- 
mediate variables, not free in the resulting binding-time types or environments. 
At certain stages during inference, constraints that have been imposed on these 
variables will no longer be of interest, and the cylindrification operator can then 
be employed to simplify constraints. Inference will in fact be using Boolean con- 
straints. When we later translate binding-time constraints to a Boolean form, the 
existential quantification will turn into existential quantification over Boolean 
variables. 

Example 1. The identity function \x.x has the polymorphic type Va.a ^ a. 
Its binding-time type is V/3i, /32, 5.(/3i < ^ 2 ) A (5 < /3i) A (5 < /32) /3i i-i /32 

where the constraint (/3i < /32) A (5 < /3i) A (S < (32) describes the binding-time 
behaviour of the identity function. The first conjunct says that output from 
the function is dynamic if input is. The other two conjuncts express a well- 
formedness condition. Note that the binding-time type variables (3\ and P 2 must 
range over the same type structure as variable a. An instance of Xx.x’s type is 
(Int ^ Int) ^ (Int ^ Int) with 

S) 

a possible binding-time type. This expresses that Xx.x can take a function of 

s 

binding-time type D S and the application’s context can treat the result as 
being of the same binding-time type. Another possible binding-time type is 

(D S') (D D) 

showing how the binding-time type system allows for coercion: It is acceptable 
for a static value to be treated as dynamic, if required by the context. 

2.4 Shapes 

To relate an expression’s valid binding-time type to its underlying type, we em- 
ploy a “shape” system. In Example ^ the identity function’s binding-time type 
V/3i, /32, S.C => /3i 1 -^ /32 has shape Va.a ^ a. Formally, a shape environment A 
maps a polymorphic binding-time type variable to its corresponding underlying 
polymorphic type variable. If Z\ = {/3n : ai, . . . , /3„fc^ : a„} then domain{A) = (3 
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(BooK) A \- 5 \ Bool (Intj) A \- 5 \ Int 



(Bools) A'^ S-. Bool (BooId) zi h D : Bool 



(Ints) Z\ h S : Int (Into) zi h D : Int 




Z\i h ri : ti Zi2 1“ T2 : t2 




A' = {/3ii : Oi, . . . , : Oi 



(V) 



ziUzi'l-r:t range{A) 



/^nl • On 7 ■ ■ ■ 5 f^nkji ■ ZKn} 



Z\ I- vp,6.c => T -.ya.t 



Fig. 2. Shape rules 



is the domain of A, and range(A) = d is the range. The judgement Z\ h r; : cr 
states that under shape environment A the binding-time type rj has shape a. A 
judgement Z\ h 77 : cr is valid if it can be derived by the shape rules in Figure El 
For simplicity, we usually omit information such as {(3\ : a, (32 ■ o:} in a 

type scheme like V/3i, /? 2 , <J-(/3i < /^z) A (J < (3i) A {S < P 2 ) Pi ^ Pi- Shape 
information is easily recovered by inspection of an expression’s underlying type 
and the corresponding binding-time type. 

Shape inference will be important when performing binding-time analysis. 
Given an underlying type t, we will sometimes want the most general shape 
environment A and binding-time type r such that A h t : t. We say (4\, r) 
is more general than (A',t') iff Z\ h t : t, A' h t' : t and there exists a 
substitution p such that pA — A' and pr = t' . 

Lemma 1. Given a type t, one can compute the most general shape environment 
A and binding-time type r such that A \- t : t. 

LemmaHensures that there exists an algorithm which computes the most general 
shape environment A and binding-time type r given an underlying type t. In 
our presentation, we treat the algorithm as a deduction system with clauses 
of the form t h (r. A) where t is the input value and A and r are the output 
values. Similarly, there is an algorithm which computes the most general binding- 
time type T, given a shape environment A and an underlying type t (we write 
A,t h r). Finally, there is an algorithm to compute the most general shape 
environment A given a binding-time type r and an underlying type t (we write 



T,t h A). 
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3 Binding-Time Logic 

We assume expressions to be we 11- typed in the underlying type system. Binding- 
time properties of expressions are specified by typing judgements which are of 
the form C, T h {e :: a) : rj where C is in BTC, F is a binding-time type 
environment assigning binding-time types to the free variables of e, (e :: a) is 
a type-annotated expression, and rj is e’s binding-time type. Always implicit, is 
a shape environment A associating binding-time type variables with underlying 
type variables. 

We assume that we can translate a program’s initial type environment (which 
maps primitive functions to their underlying types), into an initial binding-time 
environment mapping primitive functions to their binding-time types. We require 
that the binding-time type of a primitive function is “directed” : 

Definition 1 (Well-shapedness, Polarities, Directedness). Given a type 
scheme a = Vd.t, a binding-time type rj = V/d.C r, a shape environment A, a 
constraint C' , and a binding-time type rb 

— We say that ij is well-shaped (wrt. a and A) iS A rj : a. 

— We say that {C\r]) is well-shaped iff p is well-shaped. 

— We define polarities recursively: [3 appears in positive position in r' (written 
r'[/3+]) iff r' = P, or else t' = ti T 2 and either t\[P~] or T2 [/?+]. Similarly 
P appears in negative position in r' (written t'\P~]) iff r' = ti i-i- T 2 and 
Ti[P+] or T 2 [P~]. 

— We say that (C', rp is directed (wrt. u and A) iff (1) i] is well-shaped, and (2) 
there exists an extension A' A A such that A' V- t ■. t and (3) for each /3i, /?2 
such that C A C" h {P\ <s P2), (a) A' \- Pi ■. a and A' \- P2 '■ a (that is, Pi 
and P 2 have same type according to A') and (b) r[/3j"], and t[/ 3^] (that is, 
Pi occurs in negative position and P 2 occurs in positive position). 

— We say that r] is directed iff (true, rj) is directed. 

These definitions extend to type environments in the natural way. 

The directedness condition simply states that all constraints of the form 
{Pi <s P2) describe relations between input values Pi and output values P2- This 
is naturally fulfilled by a binding-time analysis, since intuitively, information 
flows from negative to positive positions. 

Example 2. A directed binding-time description of if-then-else (ite :: Bool — > 
a ^ a ^ a) is as follows: 

ite : V 5 , Pi,P2, Pz-{Pi <s Ps) A {P2 <s Ps) A (<5 </ P3) ^ 6 Pi P2 Ps 

Note that the constraint {S <f P3) vacuously satisfies the directedness condition. 

The constraints C that can appear in type schemes and on the left hand side of 
the turnstile are restricted to those generated by the grammar: 

C ::= {bi <a 62) I (n <. T2) I (& </ r) I wflir) \C AC \3p, IC 
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(Var) 

(Sub) 

(Abs) 

(App) 

(Let) 

(31) 

(VI) 

(VE) 

(Fix) 



{x ■. rj) G r A \- Tj : a 

c,r \- {x-.-.u)-.n 

C, r I- (e :: t) : T2 C h (t2 <s n) C h wft{Ti) 

C,r h (e :: t) : n 

C,Fx.x : Ti h (e :: t2) : T2 C \~ wft{Ti) A \- t\ ■. t\ 

C, Fx h {Xx.e :: ti ^ 12) : n T2 

C,r h (ei :: ti ^ fe) : (n ra) h (52 :: fi) : n 

C, r h (ei 62 :: t2) : T2 

C, Fx h (ei a) :rj C, F^.x : f] \- {e2 t) ■. t 
C, Fx (let a; = (ei :: a) in 62 :: t) : T 

C,rh(e::t):T P,S = fv{C)\fv{F,T) 
3 P, 5 .C,F h {e::t)-.T 

C AD,F \- :t A \- t : t 0 ,S C fv{D,T)\fv{F,C) 

where A h I3ij : ai 

C A 3 ^, 5 .D, F h (6 :: Va.t) : V^, 5 .D ^ r 

C,r h (®::Va.t) ■.'iP, 5 .D ^ t 
A \- T ■. t inst{A, i,a) = f 
C h [f/P,b/S]D 



C,F ((®ttt) :: [t/a]t) : [t//3,6/5]t 

C, A. a; : rj \- {e t) : rj C \~ wft{ri) A \- rj : t 
C, Fx \- ((fix 2; :: t in e) :: t) : rj 



Fig. 3. Binding-Time Typing Rules 



where ri and T2 must have the same shape. This excludes, for example, con- 
straints of the form {P <s S 1-^ D). Such a restriction is necessary, as we do not 
allow for sub typing in the underlying type system UL. 

Figure 0 defines the binding-time logic. F^ denotes the environment {y : 
V & r \ y x}. Most of the rules are straightforward. For example, rule (Sub) 
expresses the obvious subsumption (or coercion) rule for binding-time types. 

s 

Rule (Abs) says that, in order to deduce the binding-time type t\ T2 for an 
expression {Xx.e :: ti — > 12), we must be able to deduce T2 for (e :: t2) under 
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the additional assumption that x has type t\ (which is well- formed and of shape 
t\). Rule (31) allows unnecessary binding-time type variables in the constraint 
component of a typing judgement to be discarded. In rule (VI), we require that 
quantification over (i variables is restricted to /3s related to polymorphic variables 
a by the shape environment. 

Rule (VE) allows instantiation of binding-time type schemes, at a variable’s 
usage site. We need to compute the corresponding binding-time type instances 
for each underlying type instance. Let zi be a shape environment, i a sequence 
of underlying types, and a a sequence of underlying type variables. We define 
inst{A,t,a) = f where f are fresh binding-time types of appropriate shape: 
Each element of f is related to the corresponding element of /3 by the shape 
environment A. That is, A,ti h where A h /3y : a^. 

A judgement C, T h (e :: a) : ij is valid if the judgement is derivable by the 
rules in Figure 0 

Lemma 2 (Conservative Extension). Let e be well-typed with type a and let 
the binding-time environment F be well-formed and direeted. Then C,F \- (e :: 
cr) : rj for some eonstraint C and binding-time type rj. 

From here on we consider only expressions that are we 11- typed in UL. 

The following lemma allows us to ignore well-formedness constraints in exam- 
ples. We will also assume from now on that all binding-time types are directed. 

Lemma 3 (Well- Formed and Directed). Let e be well-typed with type a. Lf 
C,r \- (e :: cr) : ry and F is well-formed and directed, then {C,rj) is well-formed 
and direeted. 

We define an ordering on binding-time type schemes for use in the inference 
algorithm (read C" h rj < rj' as rj is more general than rj' in the context of 
constraint C")\ 

C" h (V^, ~5.C ^ r) < 1'.C ^ t') iff C" A C" h 3^, 8 .C A (r <, t'). 

(We assume here, with no loss of generality, that there are no name clashes.) 



4 Translation to Boolean Constraints 



Ultimately, binding-time constraints are nothing more than Boolean constraints. 
We can read the binding-time values S and D as false and true, respectively. 
S variables are Boolean variables, and a constraint (Ji <a S 2 ) is simply an 
implication ^ 62 . Since all valid binding-time type constraints (ri <g T 2 ) 
have the same shape, every binding-time constraint can also be understood as 
a Boolean constraint. We map a binding-time constraint C to its corresponding 
Boolean constraint [C] as follows: 
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1*51 

m 

1^1 

K&l <a ^2)] 

I(<5 </ /3)1 

lih <f Ti ^ T2)] 
I(ti ^ t{ <s T 2 ^ T^)l 

K/3 <. P')l 

lwft{Ti ^ T2)] 
ICl A C2I 
13P,ICI 



true 

S 

[&1I ^ I&2] 

p 

[&1I - I&2] 

[&1I ^ I&2] A |(r{ T^)] A |(t2 <s ti )1 

P^P' 

1(63 </ n)] A lwft{Ti)j A 1(63 </ Ta)] A lwft{T2)j 

[C^il A IC2] 

3PJ.ICI 



We note that C h C" iff |C] \= |C"], where ^ denotes entailment in proposi- 
tional logic. 

Since the class of Boolean constraints generated is a subclass of the set HORN 
of propositional Horn clauses, an immediate advantage of the translation is that 
we have linear time algorithms for satisfiability [^, and hence many other opera- 
tions. If more sophisticated analysis calls for a larger class of Boolean constraints 
then there are efficient representations and algorithms for Boolean constraints 
available, for example based on ROBDDs P|. Finally, useful operations such as 
existential and universal quantification, conjunction and disjunction have a clear 
and well-understood meaning. 



5 Binding-Time Inference 

We assume that we are given a well-typed program, each subexpression anno- 
tated with its underlying type. Binding-time inference computes the missing 
binding-time information. The inference algorithm. Figure 0 is formulated as a 
deduction system over clauses of the form 

r,e:-.t (C,r) 

with a binding-time type environment F and a type-annotated expression e as 
input, and a binding-time constraint C and a binding-time type r as output. An 
algorithm in style of algorithm W can be derived from this given specification 
in a straightforward way. 

All rules are syntax-directed except rule (3 Intro). This rule is justified by 
the corresponding rule in the logical system. We assume that rule (3 Intro) is 
applied aggressively, so that useless variables do not appear in analysis results. 
Rule (Var-A) handles lambda-bound variables whereas rule (Var-Inst) handles 
instantiation of let-bound variables. Considering the binding-time logic of Fig- 
ure El the straightforward approach to polymorphism is to instantiate polymor- 
phic binding-time variables in both constraints and binding-time types with a r 
of the appropriate shape. 
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X : 'ip, 5 .C => T £ r t' = [t/a]t T,t \- A inst{A, t,a) = t 5 ' new 
(Var-Inst) C = \[f /p, 5 ' /S\C\ t' = [f//?, < 5 '/^]^ 



(Var-A) 



(Abs) 



(App) 



{x : t) £ r C = \{t <s t') a w/t(r')] t h {A,t') 







r,x::t (C,r) 


Fa,.® : n 


,e 


•• ^2 iinf (C, T2) ti b (Ai,n) 5 new 


Ex , A® .6 . . 


ti 


^ ^2 iinf (C A lwft{Ti 4 T 2 )]),ri 4 ® 2 ) 


r, ei :: 




^ inf (^15^1) C2 •• ^2 ^ ifif (C*2,t'2) 


C = 


: Cl A C 2 A [(ri <s T2 rs) A wfpTs)} 






(5 new ts h (A3,rs) 



r,((ei :: ti)(e 2 t 2 )) :: ts iinf (C,T3) 



(Let) 



A,ei :: h (C'i,n) 

gen{Ci,rj;,Ti,ti,a} = (Co,Vo) F,c-x ■. go,e 2 ■■ t 2 (C 2 , T 2 ) 

C = Co /\C 2 

Ta;, (let X = (ei :: Vd.ti) in 62 ) :: t 2 i-i^f "^ 2 ) 



(Fix) 



(3 Intro) 



t (A,t) 5 ^ fv(T) rjo = i5.lwft{T)j ^ r 

F{r,o-x : ?7o,e :: t) = (C, r) 

A, (fix a: :: t in e) t {C, r) 

r,e::t {C, r) ^ J = fv{C)\fv{r, r) 
r,e::t (3/3, AC, r) 



Fig. 4. Binding-Time Inference 



Example 3 . Consider the (type annotated) polymorphic application 
((id tt Int ^ Int) :: (Int ^ Int) ^ (Int ^ Int)) (g :: Int ^ Int) :: Int ^ Int 
where id :: ia.a a. Here are binding-time descriptions for id, g and (id g): 

id : ((/3i <s /3a) A (^2 </ /3i) A (52 </ ^ Ps) 

g ■ ((^5 <a ^4) A (ds <a S6),S4 I- 4 - ^5) 
id g : ^7 5g 
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We need to build an instance of id where all polymorphic binding-time variables 
of shape a are instantiated to binding-time types of shape (Int ^ Int). We 

substitute 5io ^ 612 for /3i, ^14 Sie for (3^ and < 5 i 3 for 82 , resulting in 
(id tt Int — > Int)::(Int — > Int) ^ (Int ^ Int) : (Jig ^ ^ 12 ) ^ 8 u ^ Jig 
under the “instantiated” constraints 

((^10 612) <S (^14 <^ 16 )) A (< 5 i 3 </ ( 5 io 542)) A (< 5 i 3 </ (^14 ^le)) 

These constraints are translated into Boolean constraints using the constraint 
rules of Figure [Hand Section 0 We then proceed to analyse (id g) by applying 
the usual inference rules. 

Rule (Let) introduces annotation polymorphism as well as type polymorphism. 
Note that we can not simply generalise over all the free binding-time type vari- 
ables because the expression might not be annotated with its principal type. 



Example 4- Consider the following type-annotated program: 

g :: Vai.ai ^ ((ai,Int), (o!i,Bool)) 
g X = let f :: Vo!2.ai ^ «2 — ^ (ai, 0:2) 
f y z = (y. z) 

in ((f tt Int) X 1, (f tt Bool) x True) 

Whatever binding-time variable P we assign to y, we are not allowed to quantify 
over p. 

Therefore a polymorphic binding-time variable is only considered free if its cor- 
responding polymorphic type variable is free. 

We define a generalisation function giving the generalised type scheme and 
the generalised constraint. Let C be a constraint, E a type environment, r a 
binding-time type, t an underlying type and a a sequence of underlying type 
variables. Then 



gen(C, E, r, t, a) = (3/3, 5.C, V/3, 5.C => r) 



where 

— T,t \- A 

— P = {P \ A \- P : a iov some a G d} 

— 8 = fv(C,T)\(fv{r) U domain{A)) 

We first associate all free binding-time type variables with all free underlying 
type variables. Then, a binding-time variable /3 is free if the corresponding type 
variable a is free. It remains to compute all free annotation variables 8 . We 
further note that we push the whole constraint C into the type scheme (one 
could be more efficient by pushing in only the affected constraints). 
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In rules (Var-A), (Abs), (App), and (Fix), given an underlying type t, we 
generate a most general binding-time type r (the most general shape environ- 
ment A is not of interest) such that A \- t : t. Moreover, we add the constraint 
wft{T) to the output constraint to ensure well-formedness of the newly generated 
binding-time type. For example, a A-bound variable of type Int — > Int, could 
give rise to a binding-time type of < 5 i ^ S3. The constraint wft{Si ^ S3) reduces 
to ^2 — > ( 5 i A <^2 — ^ ^3- 

With (Fix), we follow Dussart, Henglein and Mossin [Zj, performing Kleene- 
Mycroft iteration until a fixed point is found. Defin 60 

T{r^.x : T]i,e :: t) 

= T{r^.x : e :: t) if true F 77^ < 77^+1 

= (C, r) if true 1-77^ = 77^+1 

where F^.x : 77^,6 :: t (C*, r) and (•,??i+i) = gen^C^F^.x : 77i,r). 

Note that the sequence rjo <■■■< r]i <■■■ is necessarily finite: The binding- 
time constraint rules (Figure ensure that only binding-time types of the same 
shape are comparable. Moreover, as the generated binding-time type schemes are 
of the form WS.C r, that is, the quantifier is over annotation variables only 
(not binding-time type variables), each only has a finite number of instances. 

Example 5. Consider the binding-time analysis of the following program 

g :: Bool ^ Int ^ Int — > (Int, Int) 
g p X y = if p then (x,y) else (snd (g p y x), y) 

or, in our syntax. 



fix g in Ap.Ax.Ay.ite p (x,y) (snd (g p y x),y) 



where 

“ (ei, 62) builds a pair from two expressions. 

— snd returns the second element of a pair. 

— ite is a built-in if-then-else function which returns a pair of Ints. (Two ex- 
tensions of the algorithm, supported by our implementation, allow a more 
natural definition: ( 1 ) If-then-else can be written using a built-in case ex- 
pression, and (2) polymorphic application allows us to use the usual if-then- 
else function and instantiate it to the pair in this context.) 

Fixed point calculation starts with a fresh binding-time type scheme 770 for g: 

g : VAC ^ ^2 ^3 ^ (<^4, 

^ In terms of Boolean constraints, the binding-time type scheme ordering given at the 
end of Section 0 translates to: (VA-Ci => n) < (VA.C2 T2) iff C2 F BA.(Ci A 
(ri <s 7-2)). (Without loss of generality we assume there are no name clashes between 
( 5 i and < 52 .) 
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C consists entirely of well-formedness constraints: 

^7 — > A ^7 — > Jg A 5g — > (^2 A (5g — > <5g A <5g — > ^3 A (5g — > ^6 — > (^4 A ^ <^5 

However, this constraint plays no further part in this example. Indeed, well- 
formedness constraints have no impact in this example, so to simplify the pre- 
sentation, let us treat tjq as simply 

VJ.true ^ 

and consider that the initial binding-time environment /g contains: 
snd : Vj . ^2 ^ <^4 (< 5 i , < l 2)‘^'®'-^<54 

ite : VJ.^i — > (5io A ^2 — ^ ^8 ^3 — > 5g A ^4 — > <5io A dg — > 5g A Jg — > 5g A ^7 — > <5io 

Similarly we shall ignore the variable renaming in the (Var-A) rule. 

Inference now proceeds as follows. The body of the lambda abstraction is 
analysed in the binding-time environment 

A = A,g : 7?o, P : A,x : (5g,y : A 

For the then-branch we have 

A,(x,y) (true, (<5g,(Ig)‘^i“) 

That is, no constraints are contributed by this subexpression. For the else- 
branch, consider first the sub-expression g p y x. Three applications of rule (App) 
effectively introduce a binding-time type (5n, <5i2)'^^^ for the sub-expression, to- 
gether with the constraint 

(5l4i-^-5i5i-^-5i6i-^((Il7, <s ^7i-^-(Igi-^-58i-^-((Iii, 512 )*^^^) 

Notice how a fresh binding-time type 5i4'-^(5i5i-^(5i6'-^(<5i7, has been in- 

troduced via rule (Var-Inst). 

The constraint translates to 

S 7 ^14 A 5g ^ <^15 A (5g ^ 5 i6 a 5i7 5n A dig ^ 5x2 A (5ig ^ 5 i3 

However, since variables 5n to Jig are of no further interest (they are neither 
free in the environment, nor in the result type (Jii, we can existentially 

quantify (rule (3 Intro)): 

3Ji4 • • • Jig. J7 ^ Ji4 A Jg ^ J15 A Jg ^ J16 A J17 ^ Jn A Jig ^ J12 A Jig ^ J13 
which is equivalent to true. Hence we have 

A, g P y X (true, (Jii, Jig)"^'®) 
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Similarly, for snd (g p y x) we introduce the result type ^20) and generate the 
constraint 



^22 ^ <^24 A 1 (^ 21 , < 522)'^^^'-^^24 (^11 , ^ 12 )'^^^ '-^<^ 20)1 

Again, once we have translated to a Boolean constraint and eliminated uninter- 
esting variables, we are left with the vacuous constraint true: 

A,snd(g pyx) (true,52o) 

It follows that 



A,(snd(g p y x),y) (true, (^20, 

Calling the result of g’s body (<526,^27)'^^®, and introducing a fresh instance of 
the type for ite, we get 

|(<529'-^(^30, '-^(^33, ^34) ‘^^'*'-^(^36, ^37)'^'®® 

<, 6^^{Ss, 6gY^°^(S20, <5g)^“^(526, 



with the additional constraint 

^29 ^ ^38 A (I30 ^ SsQ A S31 —)■ S3Y A (I32 ^ (538 A ^33 ^ ^30 A (534 ^ S37 A S35 —>■ S38 

The structural constraint translates to 

S7 —>■ S29 A Ss —>■ S30 A Sg —>■ S31 A Sio (532 A S20 <^3sA 

^9 ^ (534 A S25 S35 A S36 S26 A S37 S27 A ^38 ^ (528 

However, only ^7,63, <^9, ^26: ^27, and ^28 are of interest. Existential quantification 
over the remaining variables yields 

S7 S28 A 68 ^ S26 A 6 g ^ S27 

For the lambda abstraction we therefore obtain 

To,g : ?7o, Ap.Ax.Ay.ite p (x,y) (snd (g p y x),y) 

^inf ^ ^28 A ^8 ^ (526 A (5g ^ ^27) ^7'^(58'^^9 ((526: ^27)“^^*) 

Generalising, we have a new binding-time type scheme rji for g: 

g : WS.Sy 628 ASs ^ S26 A Sg ^ S27 ^ 7 i -^( 58 i -^(59 i -^- (( 526 , 627)^^^ 

(ignoring well-formedness constraints). 

Since rji ^ rjg, the fixed point calculation needs to continue. The effect of iji 
is to add new constraints when analysing g p y x. We leave it as an exercise for 
the reader to show that rji is a fixed point. (Hint: snd removes the constraints 
we did not already have.) 
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We can state soundness and completeness results for well-typed programs. 
Soundness means that for every deduction in the inference system, we can find 
an “equivalent” deduction in the logic. We note that the constraints derived in 
the inference system are Boolean constraints whereas constraints in the logical 
system are in BTC. Therefore, equivalence here means that the constraints 
derived in both deduction systems are (semantically) equivalent after translating 
the constraints in the logical system into Boolean constraints. 

Theorem 1 (Soundness of Inference). Let T, e :: a (C, r). Then 

C ,r h {e :: a) : r] for some C sueh that gen{C, r,r) = {Co,rj) and |C"] = Co- 

Completeness states that every deduction derivable in the logical system is sub- 
sumed by a deduction in the inference system. 

Theorem 2 (Completeness of Inference). LetC,T h (e :: a) : y/3i,6i.Ci => 
Ti- Let r,e:: a (C 2 ,r 2 ). Then |CACi] |= 3 ^ 2 , ^ 2 -(C '2 A |(t 2 <s n)]) where 
hM=fv{C2.T2)\fv{r). 

In addition, we can state that inference yields principal types. 

Definition 2. Given a pair (T, e :: a) consisting of a binding-time environment 
r and a type-annotated expression (e :: cr), together with a pair (C,rj), where 
C is a binding-time constraint and 77 is a binding-time type. (C, rj) is a prineipal 
type of {r,e :: a) iff 

1. C, r \- (e :: a) : r] 

2. whenever C', T h (e :: a) : rf , we have C \~ C and C \~ rj < rj' . 

(The ordering on binding-time type schemes was defined at the end of Section 01 ) 
Note that principality is defined with respect to a given type-annotated expres- 
sion. Hence, annotating an expression with a more general (underlying) type 
may result in a more general binding-time type. A principal binding-time type 
represents a class of types which are semantically equivalent. This is in contrast 
to 0 where principal types are syntactically unique. 

Corollary 1 (Principal Types). Let rj = \/f3.,6.C => r. Assume 

true, 0 h {e :: a) : fj 

where e is a closed expression. Let (true, 77 ) he the principal type of (0,e :: tr). 
Then 0,e :: tr hj^y (C, r). 

6 Alternative Methods for Handling Polymorphism 

With certain constraint solvers the instantiate method (rule (Var-Inst)) is im- 
practical. They store the constraints in an optimised internal form; to reverse 
engineer this representation to find which relations hold between polymorphic 
variables may be very inefficient, if not impossible. 

However, it is always possible for the solver to determine if a constraint set 
entails a particular relationship amongst variables. Many solvers can do this effi- 
ciently. The Test-Add method uses entailment to avoid constraint instantiation. 
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6.1 The Test-Add Method 

Instead of instantiating variables in the constraint component of the type scheme 
y/3,S.C T, the “Test- Add” method queries which implications hold between 
elements of f3 in C. For those that hold we add inequalities between the corre- 
sponding instantiating rs to C. 

Note we have two forms of constraints involving polymorphic binding-time 
variables. (3 ^ (3' (connects two polymorphic binding-time variables), and 5 ^ (3 
(a well-formedness condition or generated from a conditional expression such as 
if-then-else) . 

Consider a binding-time type scheme ry = V/3, 5.C => r, an underlying type 
scheme a = Wa.t, a shape environment A and a sequence of instantiation types 
Tij such that A \- T : t and for each A h /3y : it holds that A h Tij : ai. We 

define 



Tai(A,T,Ti) = < 


VI 


A h Pij : ai,r[PJ, ] 
A h /3ife : ai,T[pf^(\, 






C h (Pij - P^k) J 



T{A, T, r) — f\cti&range{A) '^0 

Recall that we only consider directed constraints (Definition^ and the inference 
rules preserve this condition. 

The set of generated constraints is a subclass of HORN. Hence, the test C |= 
{l3ij (3ik) can be performed in linear time. It remains to handle constraints of 
the form (J </ /3). In Example 01 we had the constraint {82 <f Pi) where 62 was 

instantiated to <5i3 and /3i to <5io ^ 812- We define 

AaAPi,fi) = A{/3y ^ I|T-yl]} 

A{A,i3,f) = /\a.^range{A) AaiiPiyTi) 
where |r| refers to t’s top-level annotation and is defined as follows: 

\b\ = b \P\=P \T^T'\ = b 

This ensures that if we have a relation 8 — > Pij in the constraint component of 
the type scheme, then together with Aa, we obtain 8 —> \Tij\. 

Example 6. We reconsider Example 01 Since /3i ^ /3a holds in id’s binding-time 
type, the T operator yields the constraint |(<5io ^ ^12 <s 814 ^ <5i6)]. The 
A operator yields the constraint /3i ^ <5n A /3a — > Jia. Adding id’s constraint 
component |(/3i <s /3a) A (<52 </ Pi) 8.(82 </ /3a)], we obtain exactly the same 
result as the instantiation method. 

The following states that “Instantiate” and “Test- Add” are equivalent. 
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Lemma 4. Given a binding-time type scheme ij = V/3, S.C r, an underlying 
type scheme a = \/a.t, a shape environment A and a sequence of instantiation 
types Tij such that A \- t : t and for each A h Pij : cn it holds that A h Tij : Ui . 

[[f /^]C1 = 3p.{lCj A T{A, T, f) A ^(Z\, 0, f)). 

LemmaEI allows us to replace rule (Var-Inst) with the equivalent rule (Var-Inst'): 

X : V/3, S.C ^ T £ r t' = \i/a\t 
r, t h zi inst{A, t,a) = f 
(Var-Inst') C" = |[V/5]C] AT(Z\,t, f) A A{A, (3,f) 
r' = [r//3, V/^]r V new 
r, {{x :: Va.t)jJ t) :: t' (C",r') 



Theorem 3 (Polymorphic Application 1). Let F he directed. Assume F, e :: 
t using rule (Var-Inst) while F,e :: t using rule 

(Var-Inst! ). Then C = 7t(C") and r = 7 r(r') for some renaming tt. 

6.2 The Match Method 

We briefly comment on further alternative methods. For more details we refer 
to [n|. In “Test- Add” we query possible relations between binding-time vari- 
ables which are related by the shape environment to the same underlying type 
variable. We may wish to avoid this querying during the actual analysis. The 
“Match” method allows us to simply add constraints which will give a similar 
effect. 

The method works by matching the type component of the variable’s type 
scheme with the requested type at the instantiation site. 

Example 1. Consider again Example 0 First generate a new binding-time type 

of the desired form: (5io ^ 12 ) ^ ^14 ^ (5i6- Instead of substituting the 
appropriate types for and < 52 , generate constraints as follows: 

(/3i A (3^ <g ((JiQ id| (5 i 4 ^le)) 

= (<5io 5\2 <s /3i) A (/3s <s 5i 4 ^le) A ^2 ^ ^13 

Note that the polymorphic binding-time variables (3\ and /3a result from the 
polymorphic variable a. How do we translate the structural constraints into 
Boolean constraints? Since in this case id’s constraints support /3i ^ /3s we 

should have that (^lo ^12 <s < 3 i 4 ^ If id did not support /3i ^ /3s then 
we should not have this constraint. This reasoning is captured by the constraint: 

(/3i ^ /3a) ^ I((<5io ^12 <s <^14 <^ 16 ))] 

= (/3i ^ /3a) ^ ((5 i 4 ^ (5io A Jn ^ J 15 A J 12 ^ ^le) 
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7 Constraint Based Fixed Points 

If we are willing to give up some accuracy in the presence of polymorphic recur- 
sion in binding-time types then we can replace the (Fix) rule by a simpler rule 
which just ensures that the constraints generated will be above the least fixed 
point, while avoiding a fixed point computation. 

The following rule forces all recursive invocations to have the same binding- 
time type. This must be a super- type of the binding-time type of the recursively 
defined expression. 

A,t \- t' : r',e :: t (Ci,r) Z\,tl-r" 

(FbcC) C = Cl A |(t t')1 a |(t <, t")1 a |w/t(r')l A |w/t(r")l 

r^, (fix X ::t\n e) :: t \--^j (C,r") 

The “shortcut” of stipulating (r <s t') may have the unfortunate side-effect of 
introducing constraints amongst the arguments to a function. An example of 
this phenomenon appears in Example 0 below. A simple way of eliminating such 
constraints is to couch the result in terms of a fresh binding-time type t" , with 
the constraint (r <g t"). 

The following shows how this approach may lose accuracy, finding a correct 
but inaccurate binding-time type for a function. 

Example 8. Consider again g from Example 0 

g :: Bool ^ Int ^ Int ^ (Int,Int) 
g p X y = if p then (x,y) else (snd (g p y x), y) 

The binding-time type inferred using the rule (Fix) was 

g : \/S.Sy — > ^28 A ( 5 s ^ <^26 A > S27 {S26, <^27)'^^® 

This time, the initial environment E has 

g : < 5 ii— *'J 2 '~^< 53 '~^(^ 4 ) ^5)^'' 

so that Si .. ,6 q are not generic. For the subexpression g p y x, we now get 
( 5 iI-^-^ 2 '~>< 53 '^(^ 4 : , <512)'^^^) 

This translates to 

67 — ^ A ^9 — > S2 A ^8 — ^ ^3 A S4 — >■ Jii A S^ — > ^12 A > ^13 

and none of the variables involved can be discarded. For snd (g p y x) we intro- 
duce the result type S14, and add the constraint 

^16 < 5 i 8 A 1 (^ 15 , (^11, ^12)'^^^'— ^^14)1 
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that is, 



<^16 ^ <^18 A <5ii ^ <^15 A 5\2 ^16 A (5i3 ^ 6 n A Jig — *■ 5u 

Here variables 5n . . . Jis and J 15 . . . 5is can be discarded. Existentially quantify- 
ing over these leaves us with 

S'! — > A <5g — > ^2 A Jg — ^ (^3 A ^5 — > 5\i 



So 



E, snd(g pyx) (^7 ^ <5i A 5g ^ J2 A 5g ^ ^3 A Jg ^ 5 m, 5m) 

It follows that 

r, (snd(g p y x), y) (5t ^ 5i A 5g ^ 52 A 5g ^ 53 A 5s ^ 5m, (5m, 5g)'^^®) 

Calling the result of g’s body (52o, 521)*^^^, we get for the ite expression 

I(523'-^(524, 525 )*^^® '-^(527, 528)*^^® '-^(530, 531)*^^^ 

<, 57e^(5g, 5g)'5i“e^(5M, 5g)‘^iSe^(520, 521)-^“)] 

with the additional constraint 

523 ^ 532 A 524 ^ 530 A S 25 531 A 526 ^ 532 A 627 530 A 5gg ^ 531 A 52g ^ 532 

The structural constraint translates to 

S 7 S 23 A 5g ^ 524 A 5g ^ S 25 A 5io ^ 526 A 5i4 ^ 527 A 
5g ^ 5gg A 5ig ^ 5gg A 530 ^ 520 A 5oi ^ 521 A 632 S 22 

Variables of interest are 5i . . . 5g and S 20 ■ ■ ■ 522- Eliminating the rest, we get 

S 7 — > 5i A 5g — > 52 A 5g — > 53 A 5s — > S 20 A S 7 — > 522 A 5g — > S 20 A 5g — > 52i 

For the lambda abstraction we therefore obtain the result 

(57 — > 5i A 5g — > 5g A 5g — > 53 A 5s — > 52o A S 7 — > 522 A Ss —> S 20 A Sg —> S 21 , 

571 — >5gi-^-5g (520, 521 )*^®®) 

At this point, the rule (FixC) adds the following two inequalities 

I(57'^5gi-^-5g (520, 521)"^®® 5ii-^-52i-^53 (54,55)'^®)] 



and 



1(571— s-5gi-^-5g (5go, 521)*^®® 

Altogether we then have 


533i-^-534i 


-^535 (5s6, 537)*^®®)] 


^ <5^1 A Jg — > 62 A Jg — ^ A J5 - 


520 A 57 — > 


522 A 5g — 


> 520 A 5g — 


— S' J7 A 62 — ^ <5^8 A J3 — S' (^9 A S 20 


— > 54 A 521 “ 


5s A 522 - 


5o A 



533 — *■ 57 A 534 ^ 5g A 535 ^ 5g A 520 ^ 536 A 52i ^ S 37 A S 22 S 3 S 
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Note that this has spurious consequences such as Sg ^ 6g. However, for the re- 
sult, we are only interested in the freshly introduced 633 . . . 63s, so spurious con- 
sequences are removed. After existential quantification we obtain the (weaker) 
result (changes relative to Example 0 are underlined) : 

g : '^6.633 — > (538 A (5a4 — > S3Q A (5s5 — > (53 y A (5s4 ^ S37 A <535 ^ S3e 
=> ^33'^^34'— ^^35 (<536 j <537)'^^® 



Theorem 4. The FixC rule is sound. 

8 Discussion 

We have presented a binding-time analysis with a wide scope. The analysis is 
polyvariant and extends Dussart, Henglein and Mossin’s analysis [Z| to poly- 
morphically typed programs. It applies to a functional programming language 
with ML-style polymorphism. The handling of (type) polymorphic application 
is not straightforward. We have outlined some options, utilising the increased 
expressiveness that we obtain by using Boolean constraints. 

Types provide useful information about a program’s properties and can read- 
ily be extended for various program analyses for higher-order functional lan- 
guages. A recent trend is to present (and implement) program analysis as “non- 
standard type inference”, by marrying a type language with notation for dec- 
orating type expressions, the decorations expressing the program properties of 
interest. The direction that much of this research is currently taking is to extend 
the underlying type language beyond what the programming language requires, 
for example to include intersection types or subtyping. 

Our view is that, while it is both convenient and elegant to express program 
analysis as constrained-type inference, the language for expressing program prop- 
erties should not necessarily be coupled closely with the type system. From the 
point of view of the analysis designer and implementor, it seems more attractive 
to utilise expressive constraint languages that come with well-developed solvers 
and well-understood theories, making only the reasonable assumption that pro- 
grams presented to the analyser are well-typed and explicitly typed, for example 
by an earlier type inference phase in a compiler. 

We believe that propositional logic is under-utilised in the analysis of func- 
tional programs. It offers straightforward means for expressing dependencies and 
disjunctive information HI Indeed, some of our proposed solutions to the problem 
of analysis in the presence of polymorphic application are expressed through 
the use of non-trivial propositional formulas such as (/3 — > (}') C, as seen in 
Example 0 

We have a prototype implementation of the binding-time analysis presented 
here for the Haskell compiler, GHC. Our implementation takes a Haskell program 

^ Dussart, Henglein and Mossin 0 have disjunction in their language of binding-time 
properties, but only as a technical aid for the elimination of variables (as they do 
not have variable projection). 
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and assigns a binding-time type to every sub-expression. The information could 
be used by a program specialiser, for example Henglein and Mossin ’sca suitably 
extended to support type polymorphic programs. However, we have no such 
specialiser, and our interest is primarily the analysis of functional languages. 

The implementation includes necessary extensions to the analysis described 
here so that the language accepted by GHC is supported. In particular, it sup- 
ports algebraic data types, and corresponding constructors and case statements. 

GHC translates the input program into a de-sugared, typed internal lan- 
guage, Core ca A GHC analysis or optimisation pass is a plug-in module that 
transforms an input Core program into a transformed Core program (maintain- 
ing type correctness). Since type information is explicit in the Core program 
the implementation is a direct translation of the analysis here. Our BTA pass 
annotates variables with their BTA properties. An increasing number of modern 
compilers have a similar internal language \2'dM24l and it should not be difficult 
to incorporate our implementation within these. 

We have experimented with two constraint solvers, Schachte’s ROBDD li- 
brary m for Boolean constraints and ImpSolver, a straightforward solver that is 
limited to conjunctions of implications between variables. ImpSolver is sufficient 
for this analysis if we employ the Test- Add method for polymorphic application. 
A third solver, for propositional Horn clauses, is under development for other 
functional program analyses. 

We have run the ImpSolver based analyser on the NoFib suite of benchmark 
programs which is available with GHC. This suite consists of over 60 (multi- 
module) programs. For the 68 larger modules, those that take over 2 seconds 
to compile on our lightly loaded server, the average cost of the analysis is 23% 
of compile time, with 16% median, 1% minimum and 80% maximum. This is 
before non-trivial performance tuning has been attempted. 

The implementation is at an early stage of evaluation. It does not yet support 
cross-module analysis (instead we make a ‘good’ guess for imported variables). 
Adding this is straightforward: with each exported binding in the module’s in- 
terface file we add its binding-time type and list the implications which hold 
amongst the binding-time variables. When a binding is imported, we recreate 
the corresponding constraints and add the binding to the initial environment. 

We believe that the ideas presented here have many immediate applications. 
Binding-time analysis is essentially identifying dependencies amongst compo- 
nents of expressions, and this is also the heart of several other analyses. For 
example, this is the case for security analysis and for useless-code 

analysis [5I15I26| . program slice analysis etc. Several researchers have addressed 
the problem of how to systematically extend a type system with dependency 
information |llf)ll8ll9j . The goal of these researchers is similar to ours, but gen- 
erally more limited with respect to scope. 

Fahndrich and Rehof have recently proposed a new method for flow analy- 
sis 0. In the absence of recursive types, the method offers improved worst-case 
time complexity for flow analysis, namely cubic time analysis. The key to the 
improvement is a concept of “instantiation” constraints. This is an extension of 
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the constraint language which offers a way of avoiding the copying of constraints 
that otherwise happens when a polymorphic type is instantiated. The constraint 
solving problem is then translated to one of CFL reachability. A different kind of 
extension of the constraint language is proposed by Gustavsson and Svennings- 
son CU. A “let” construct allows for “constraint abstractions”, so that instan- 
tiation can be expressed directly in the constraint language. Again this leads to 
a cubic-time algorithm. The authors point out that the idea of incorporating 
instantiation into a constraint language goes back at least to Henglein’s use of 
semi-unification constraints m- We are currently investigating these methods 
and hope to explore whether, for our implementations, the improved complexity 
will translate into faster analysis. 

We are now developing a generic framework for specifying and implementing 
program analyses, based on constrained-type inference. We see this binding-time 
analysis as a first instance, that is, a proof of concept. We are currently applying 
the framework to strictness and flow analysis. 
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Abstract. Many type based program analyses with subtyping, such as 
flow analysis, are based on inequality constraints over a lattice. When 
inequality constraints are combined with polymorphism it is often hard 
to scale the analysis up to large programs. A major source of inefficiency 
in conventional implementations stems from computing substitution in- 
stances of constraints. In this paper we extend the constraint language 
with constraint abstractions so that instantiation can be expressed di- 
rectly in the constraint language and we give a cubic-time algorithm for 
constraint solving. As an application, we illustrate how a flow analysis 
with flow subtyping, flow polymorphism and flow-polymorphic recursion 
can be implemented in O(n^) time where n is the size of the explicitly 
typed program. 



1 Introduction 

Constraints are at the heart of many modern program analyses. These analyses 
are often implemented by two stages. The first stage collects constraints in an 
appropriate constraint language and the second stage finds a solution (usually the 
least) to the constraints. If the constraints are collected through a simple linear 
time traversal over the program yielding a linear amount of constraints the first 
phase can hardly constitute a bottleneck. But often the constraints for a program 
point are computed by performing a non constant-time operation on constraints 
collected for another part of the program. Notable examples, and the motivation 
for this work, are analyses which combine subtyping and polymorphism. There, 
typically, the constraints for a call to a polymorphic function / are a substitution 
instance of the constraints for the body of /. For these analyses, to naively collect 
constraints typically leads to unacceptable performance. Consider, for example, 
how we naively could collect the constraints for a program of the following form. 

let fo = ... 
in let /i = . . . /o . . . /o 
in let . . . 

in let /„ = . . . /„_i . . . /„_i . . . 
in 

We first collect the constraints for the polymorphic function /q. Then for the 
two calls to fo in the body of fi, we compute two different substitution instances 
of the constraints from the body of /q. As a result the number of constraints for 
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fi will be at least twice as many as those for /q. Thus, the number of resulting 
constraints grows exponentially in the call depth n (even if the underlying types 
are small). In analyses which combine subtyping and polymorphic recursion, 
and rely on a fixed point iteration, this effect may show up in every step of 
the iteration and thus the constraints may grow exponentially in the number of 
required iterations. We can drastically reduce the number of constraints if we 
can simplify the constraints to fewer but equivalent constraints. It is therefore 
no surprise that lots of work has been put into techniques for how to simplify 
constraints ITMHHI ITjurhOL IKaeH21 iSmiH4L IE!STh5l IFothtil Il’ShbL IFAH6L IAWT971 
IHehDTL IhThTI . 

Another approach is to make the constraint language more powerful so that 
constraints can be generated by a simple linear time traversal over the program. 
This can be achieved by making substitution instantiation a syntactic construct 
in the constraint language. But when we make the constraint language more 
powerful we also make constraint solving more difficult. So is this a tractable 
approach? The constraint solver could of course just perform the delayed opera- 
tions and then proceed as before. But can one do better? The answer, of course, 
depends on the constraint language in question. 

In this paper we consider a constraint language with simple inequality con- 
straints over a lattice. Such constraints show up in several type based pro- 
gram analyses such as flow analyses, e.g., binding time analyses, e.g., 

mHHnsi, usage analyses, e.g., f rwiVliidj . points-to-analyses, e.g., jKKADDj and 
uniqueness type systems jBShbj . We extend this simple constraint language with 
constraint abstractions which allow the constraints to compactly express substi- 
tution instantiation. 

The main result of this paper is a constraint solving algorithm which com- 
putes least solutions to the extended form of constraints in cubic time. We have 
used this expressive constraint language to formulate usage-polymorphic usage 
analyses with usage subtyping |Svennir(isnn| and an algorithm closely related to 
the one in this paper is presented in the second author’s Master’s thesis 
(' |(ISnn| focuses on the usage type system and no constraint solving is presented). 
In this paper, as another example, we show how the constraint language can be 
used to yield a cubic algorithm for Mossin’s polymorphic flow analysis with 
flow subtyping and flow-polymorphic recursion This is a significant re- 

sult - the previously published algorithm, by Mossin, is 0(n®). Independently, 
Fahndrich and Rehof Fvm] have given an algorithm for Mossin’s flow analysis 
based on instantiation constraints which is also O(n^). We will take a closer look 
an the relationship of their algorithm and ours in section 0 



1.1 Outline 

The rest of this article is organised as follows. In section El we introduce our 
constraint language and give the semantics. In section0we present our constraint 
solving algorithm, its implementation and computational complexity. Section E] 
discusses related work and section El concludes. In appendix IXI we illustrate how 
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the constraint language can be used in a flow analysis. In appendix ini we give 
the proof of Theorem ^ 

2 Constraints 

In this section we will first introduce the underlying constraints language that 
we consider in this paper, and then extend the constraint language with con- 
straint abstractions which can express substitution instantiation. The atomic 
constraints we consider are inequality constraints of the form 

a < b 

where a and b are taken from an countably infinite set of variables. The constraint 
language also contains the trivially true constraint, conjunction of constraints 
and existential quantification as given by the following grammar. 

Atomic Constraints A ::= a < b 

Constraint Terms M, N ::= A | T | M A iV | 3a. M 

These kinds of constraints show up in several different type based program anal- 
yses such as, for example, flow analysis, e.g., which we will use as our 

running example. The constraints arise from the use of subtyping between flow 
types - i.e., types annotated with flow information. 

Depending on the application, the constraints can be interpreted in different 
domains. For example, for flow analysis we can interpret the constraints in a 
lattice of finite sets of labels with subset as the ordering. 

Definition 1. We interpret a eonstraint term in a lattice L, with a bottom ele- 
ment and the ordering C, by defining the notion of a model of a constraint term. 
Let 9 range over mappings from variables into C. Then 9 ^ M , read as 9 is a 
model of AI , is defined inductively by the following rules. 

9{a) E e{b) 9^M 9^N 9[a := d] \= M ^ ^ 

9\^a<b 0 h T 9^ MAN 9 [= 3a.M ^ 

Given a constraint term one is usually interested in finding its optimal model 
(usually the least) given a fixed assignment of some of the variables. For example, 
in flow analysis some of the variables in the constraint term correspond to points 
in the program where values are produced, often referred to as the sources of flow. 
Other variables correspond to points in the program where values are consumed, 
often referred to as the targets of flow. The existentially quantified variables 
correspond to the flow annotations on intermediate flow types. To find the flow 
from the sources to the targets we can fix an assignment for the source variables 
(usually by associating a unique label I to each source and interpret it as the 
singleton set {Z}) and compute the least model which respects this assignment. 
For this simple constraint language it is easy to compute least solutions (it can 
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be seen as a transitive closure problem) in 0{n^) time, where n is the number 
of variables. Q 

2.1 Constraint Abstractions 

When subtyping is combined with polymorphism the need to compute substi- 
tution instances of constraint terms arise. We will build this operation into our 
constraint language through the means of constraint abstractions. 

Constraint Abstraction Variables f,g,h 
Constraint Abstractions F ::= f a = M 

A constraint abstraction f a = M can be seen simply as a function which when 
applied to some variables b returns M[a := b\. Constraint abstractions are intro- 
duced by a let-construct reminiscent of let-constructs in functional languages, 
and are also called in the same way. The complete grammar of the extended 
constraint language is as follows. 

Atomic Constraints A ::= a < b 

Constraint Terms M, N ::= A | T | M A A | 3a. M \ let {i^} in M | / a 

Constraint Abstractions F ::= f a = M 

We will write FV(M) for the free variables of M and FAV(M) for the free 
abstraction variables of M. We will identify constraint terms up to a-equivalence, 
that is the renaming of bound variables and bound abstraction variables. In 
let {F} inM the constraint abstraction variables defined by F are bound both 
in M and in the bodies of F so our lets are mutually recursive. Consequently 
the variables defined by F must be distinct. We will use F to range over sets 
of constraint abstractions where the defined variables are distinct, and we will 
denote the addition of a group of distinct constraint abstractions F to F hy 
juxtaposition: F{F}. We will say that a group of constraint abstractions F is 
garbage in let F{F} in M if we can remove the abstractions without causing 
bound abstraction variables to become free. Recursive constraint abstractions 
goes beyond just expressing a delayed substitution instantiation. It also allows 
us to express a fixed-point calculation in a very convenient way. We will make use 
of this in the flow analysis in appendixEto express flow-polymorphic recursion. 

To give a semantics to the extended constraint language we need to define 
the notion of a model of a constraint term in the context of a set of constraint 
abstractions F. 

Definition 2. In a lattice C, with a bottom element and with the ordering V, 
we define 9] F \= M coinductively by the following rules (we follow the notational 
convention of Cousot and Cousot to mark the rules with a ” to indicate 

that it is a coinductive definition) . 

^ For a lattice where binary least upper bounds can be computed in constant time (for 
example a two point lattice) the least solution can be computed in O(n^) time. 



Constraint Abstractions 



67 



9',r\=a<h 



9{a) C 9{b) 



9-r^M 9-r\=N 

9;T\= M AN 



9-r^T 



9[a \= d\,r \= M d G C 
9-,r^ 3a.M a ^ FV{r) 



9-r{F)\=M 9-r{fa = M)'^M[a:=b] 

~ 9] r\^ let {F} inM 9;F{fa= M} f b 

The definition needs to be coinductive to cope with recursive constraint ab- 
stractions. The coinductive definition expresses the intuitive concept that such 
constraint abstractions should be “unfolded infinitely” . When it is not clear from 
the context we will write 9; F \=c M to make explicit which lattice we consider. 
We will say that is a consequence of M, written M ^ iff for every C, 9, F, 
if 9; F \=c M then 9\ F \=c N . We will write M AA N \Fl M \= N and N \= M. 

In definitions throughout this paper we will find it convenient to work with 
constraint term contexts. A constraint term context is simply a constraint term 
with a “hole” analogous to term contexts used extensively in operational seman- 
tics. 

Constraint Term Contexts C ::= [•] | C A M | M A C | 3a.C \ 

let T in (7 I let F{f a = C} in M 

We will write C[M] to denote the filling of the hole in C with M. Hole filling may 
capture variables. We will write CV(C') for the variables that may be captured 
when filling the hole. We will say that the hole in C is live if the hole does not 
occur in a constraint abstraction which is garbage. Our first use of constraint 
term contexts is in the definition of the free live atomic constraints of a constraint 
term. 

Definition 3. The set of free live atomic constraints of a constraint term M , 
denoted LIVE{M) , is defined as follows. 

LIVE{M) = {A I M = C[A\, EV{A) n CV{C) = 0 and the hole in C is live.} 

We will use LIVE(M) in definitions where we need to refer to the atomic sub- 
terms of M but want to exclude those which occur in constraint abstractions 
which are garbage and thus never will be “called” by the models relation. Note 
that all syntactically live constraint abstractions are semantically live since they 
are all “called” by the models relation. 

Another use of constraint term contexts is in the statement of the following 
unwinding lemma. 

Lemma 1. If EV{M) n CV{C) = 0 then 

let F {f a = M} ±Ti C[f b] <tA let F{f a = M} in C[M[a := 6]] 

This lemma is necessary, and is the only difficulty, when proving the subject re- 
duction property of the usage analysis in and the flow analysis in appendix 
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1. if a < 6, fe < c G LIVE(M) then 

36. M I— > 3b. M A a < c 

2. if A G LIVE(M), and, for some i, at G FV(A) then 

letr{/a = M} 1 -^ letr{/a = M} 
in C[f b] in C[f b A A[a := 6]] 

3. if A G LIVE(C'[/ b]), and, for some i, Ui G FV(A) then 

letr{/a = C[/b]} letr{/o = C[/6A A[o — 6]]} 
in M in M 

4. if A G LIVE(M), and for some i, ai G FV(A) then 

let r{/ o = M}{g c = C[f 6]} let F{f a = M}{g c = C[f b A A[a := b]]} 
in M in M 

Fig. 1. Rewrite rules 



The premise FV(M)nCV(C) = 0 is there to ensure that no inadvertent name 
capture takes place and it can always be fulfilled by an a-conversion. In the re- 
mainder of this paper we will leave this condition on unwindings implicit. 



3 Solving Constraints 

As we discussed in the previous section we are interested in finding the least 
model of a constraint term given a fixed assignment of some of the variables. 
In this section we will present an algorithm for this purpose for our constraint 
language. The algorithm is based on a rewrite system which rewrites constraint 
terms to equivalent but more informative ones. Every rewrite step adds an atomic 
constraint to the constraint term and the idea is that when the rules have been 
applied exhaustively then enough information is explicit in the term so that the 
models can be constructed easily. 

Definition 4. We define the rewrite relation as the compatible closure of the 
relation i— > defined by the clauses in fiaureU\ 

Here we provide some explanation of the rewrite rules. The first rule, 

1. if a < 6, 6 < c G LIVE(M) then 
3b. M ^ 3b. M A a < c 

is a simple transitivity rule. If a < 6 and b < c are free live atomic subterms of 
M we may simply add the constraint a < c. Note that the rule requires a and c 
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to be in scope at the binding occurrence of b. As a result we cannot, for example, 
perform the rewrite 

3a. 36. (a < 6) A (3c. 6 < c) ^ 3a. 36. (a < 6) A (3c. 6 < c A a < c) 

which adds a < c although it would make perfect sense. The reason is simply that 
at the binding occurrence of 6, c is not in scope. The purpose of the restriction 
on the transitivity rule is an important one. It reduces the number of rewrite 
steps due to transitivity by taking advantage of scoping information. The second 
rule 

2. if A G LIVE(M), and, for some i, at G FV(A) then 

letT{/a = M} letT{/a = M} 
in C[f b] in C[f b A A[a := b]] 

allows us to unwind an atomic constraint. Note that at least one of the variables 
in A must be bound by the abstraction. The restriction is there to prevent 
rewrite steps which would not be useful anyway. The two last rules are similar 
to the second rule but deal with unwinding in mutually recursive constraint 
abstractions. A key property of the rewrite rules is that they lead to equivalent 
constraint terms. 

Lemma 2. If M ^ N then M N 

The property is easy to argue for the transitivity rule. For the second rule it fol- 
lows from the unwinding property (Lemma [Q. The two last rules rely on similar 
unwinding properties for unwinding in mutually recursive constraint abstrac- 
tions. 

3.1 Normal Forms 

Intuitively a constraint term is in normal form when the rules in figure Q have 
been applied exhaustively. But nothing stops us from performing rewrite steps 
which just add new copies of atomic constraints which are already in the con- 
straint term. We can of course do this an arbitrary number of times creating 
a sequence of terms which are different but “essentially the same” . To capture 
this notion of essentially the same we define a congruence which equates terms 
which are equal up to copies of atomic constraints. 

Definition 5. We define ~ as the reflexive, transitive, symmetric and compat- 
ible closure of the following clauses. 

(i) A A Ar^ A (a) M AT r.- M (Hi) T A M ~ M 
(iv) if FV{A) n CV{C) = 0 and the hole in C is live then C[A] ~ C[T] A A 



Rewriting commutes with ~ so we can naturally extend ^ to equivalence classes 
of ~. With the help of ~ we can define the notion of a productive rewrite step 
M N which is a rewrite step which adds a new atomic constraint. 
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Definition 6. M N iff M ^ N and M N . 

Finally we arrive at our definition of normal form up to productive rewrite steps. 



Definition 7. M is in normal form iff M -/+. 

The main technical theorem in this paper is that when a constraint term with 
no free constraint abstraction variables is in normal form then the models of the 
constraint term are exactly characterised by the free live atomic constraints of 
the constraint term. 

Theorem 1. If M is in normal form and FAV{M) = 0 then 0; 0 \= M iff 
0 h LIVE{M) 

Given a constraint term M and a fixed assignment of some of the variables we 
can find its least model as follows. First we find an equivalent constraint term N 
in normal form. Then we extract the free live atomic constraints of the normal 
form which exactly characterises the models of N and M. Since LIVE(iV) is 
just a set of atomic constraints we can then proceed with any standard method, 
such as computing the transitive closure. The proof of Theorem E can be found 
in appendix IBI The key component of the proof is the application of two key 
properties of unwindings of normal forms. The first property is that normal forms 
are preserved by unwindings. 

Lemma 3. If let F{f a = M} ±n C[f b] is in normal form then the unwinding 
let T{/a = M}in C[M[a := b]] is in normal form. 

The lemma guarantees normal forms of arbitrary unwindings of a normal form 
which we need because of the coinductive definition of 0; T |= M. The second 
property is that unwinding of a normal form does not change the free live atomic 
constraints of the constraint term. 

Lemma 4. If let F{f a = M} in C[f b] is in normal form then 
LIVE{let F{f a = M}in C[f b]) = LIVE{let F{f a = M} in C[M[a := b]]) 

3.2 Computing Normal Forms 

Given a constraint term M , we need to compute an equivalent term in normal 
form. Our algorithm relies on a representation of equivalence classes of terms 
with respect to ~ and computes sequences of the form 



A7q Ml M2 . . . . 



The termination of the algorithm is ensured by the following result. 
Lemma 5. There is no infinite sequence of the form given above. 
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Proof (Sketch). Let n be the number of variables (free and bound) in Mq. Note 
that the number of variables remain constant in each step. Thus the number of 
unique atomic constraints that can be added to M is bounded by nf . Since every 
productive rewrite step introduces a new atomic constraint the number of steps 
is bounded by nf. 

When given a constraint term as input, our algorithm first marks all atomic 
constraints. These marked constraints can be thought of as a work list of atomic 
constraints to consider. The algorithm then unmarks the constraints one by one 
and performs all productive rewrite steps which only involve atomic constraints 
which are not marked. The new atomic constraints which are produced by a 
rewrite step are initially marked. The algorithm maintains the following invari- 
ant: the term obtained by replacing the marked terms with T is in normal form. 
The algorithm terminates with a normal form when no atomic constraints remain 
marked. The pseudo code for this algorithm is given below. 

Algorithm 1 1. Mark all atomic constraints. 

2. If there are no remaining marked constraints then stop otherwise pick a 
marked atomic constraint and unmark it. 

3. Find all productive redexes which involve the unmarked constraint and per- 
form the corresponding rewrite steps. Let the added atomic constraints be 
marked. 

4 . Go to step 

3.3 Data Structures 

The efficiency of the algorithm relies on maintaining certain data structures. 
In step 13 of the algorithm we use data structures such that we can solve the 
following two problems: 

1. find all redexes we need to consider in time proportional to the number of 
such, and 

2. decide in constant time whether a redex is productive. 

We can solve the first problem if we maintain, for every existentially bound 
variable b, 

— a list of all a in scope at the point where b is bound, such that a < 6 is an 
unmarked atomic constraint in the term. 

— a list of all c in scope at the point where b is bound, such that 6 < c is an 
unmarked atomic constraint in the term. 

With this information we can easily list all transitivity-redexes we need to con- 
sider in stepEI in time proportional to the number of redexes. When we unmark 
a constraint we can update the data structure in constant time. 

For the second problem, to decide in constant time whether a redex is pro- 
ductive, we need to decide, in constant time, whether the atomic constraint to be 
added already exists in the term. We can achieve this by a n times n bit-matrix 
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where n is the number of variables (free and bound) in the constraint term. If 
a < & is in the term then the entry in the matrix for (a, b) is 1 and 0 otherwise. 
This is sufficient for the complexity argument in the next section but in practice 
we use a refined data structure which we describe in section Id. fit 



3.4 Complexity 

The cost of the algorithm is dominated by the operations performed by step 01 
which searches for productive redexes. The cost is proportional to the number 
of redexes (productive or non-productive) considered and each redex in the final 
normal form is considered exactly once in step 0. Thus the cost of step 0 is 
proportional to the number of redexes in the final normal form. An analysis of 
the maximum number of redexes gives the following. 

— The maximum number of transitivity-redexes is, for each existentially quan- 
tified variable a, the square of the number of variables in scope at the point 
where a is bound. 

— The maximum number of unwind-redexes is, for each variable a bound in a 
constraint abstraction /, two times the number of variables in scope at the 
point where a is bound times the number of calls to /. 

A consequence of this analysis is the complexity result we are about to state. Let 
the skeleton of a constraint term be the term where all occurrences of atomic 
constraints, and the trivially true constraint have been removed. What remains 
are the binding occurrences of variables and all calls to constraint abstractions. 
Now, for a constraint term M, let n be the size of the skeleton of M plus the 
number of free variables of M. The complexity of the algorithm can be expressed 
in terms of n as follows. 

Theorem 2. The normal form can be computed 0{n^) time. 

3.5 Refined Data Structure 

The cost of initialising the bit-matrix described in section 13.31 is dominated by 
the cost of step 0 in the algorithm but we believe that in practice the cost of 
initialising the matrix may be significant. Also the amount of memory required 
for the matrix is quite substantial and many entries in the matrix would be 
redundant since the corresponding variables have no overlapping scope. Below 
we sketch a refined approach based on this observation which we believe will 
be important in practice. We associate a natural number, index(a), with every 
variable a. We assign the natural number as follows. First we choose an arbitrary 
order for all the free variables and bind them existentially, in this order, at top 
level. Then we assign to each variable the lexical binding level of the variable. 
For example, in 3a.{3b.M) A (3c. iV) we assign 0 to a, 1 to 6 and c, and so on. 
Note that the number we assign to each variable is unique within the scope of the 
variable. Given this we have the following data structures. For every variable b, 
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— a set of all a such that index(a) < index(6) and a < & is an atomic constraint 
(marked or unmarked) in the term. 

— a set of all c such that index(c) < index(&) and & < c is an atomic constraint 
(marked or unmarked) in the term. 

The sets have, due to scoping, the property that, for any two distinct elements 
a and b, index(a) is distinct from index(6). Thus the sets can be represented 
by bit-arrays, indexed by index(a) so that set membership can be decided in 
constant time. Now, to decide whether an atomic constraint a < 6 is in the 
constraint becomes just set membership in the appropriate set. 



4 Related Work 

The motivation for this paper is to reduce the cost of the combination of sub- 
typing and polymorphism and in this respect it is related to numerous papers 
on constraint simplification techniques 

iFMTM iTCTii nCTa rn^ iroTirn iawphti em 

ITWl . Our work is particularly related to the work by Dussart, Henglein and 
Mossin on binding-time analysis with binding-time-polymorphic recursion 
IDHIVIh.^l where they use constraint simplification techniques in combination 
with a clever fixed-point iteration to obtain a polynomial time algorithm. In 
his thesis Mossin applied these ideas to show that a flow analysis with flow- 
polymorphic recursion can be implemented in polynomial time Our 

flow analysis in appendix that we give as an example of how constraint ab- 
stractions can be used, is based on this flow analysis. A consequence of the 
complexity of our constraint solving algorithm is that the analysis can be imple- 
mented in 0{n^) time where n is the size of the explicitly type program. This 
is a substantial improvement over the algorithm by Mossin which is 0(n®) 0 
fMos97|. 

To represent instantiation in the constraint language is not a new idea. It goes 
back at least to Henglein’s work on type-polymorphic recursion |Hen93j where he 
uses semiunification constraints to represent instantiation. Although constraint 
abstractions and semiunification constraints may have similar applications they 
are inherently different: Semiunification constraints are inequality constraints of 
the form A < B which constrains the (type) term B to be an instance of A by 
an unknown substitution. In contrast, a call to a constraint abstraction denotes 
a given instance of the constraints in the body of the abstraction. 

Closely related to our work is the recent work by Rehof and Fahndrich !rtqT| 
where they also give an 0{n^) algorithm for Mossin’s flow analysis. The key idea 
in their and our work is the same - to represent substitution instantiation in 
the constraints by extending the constraint language. However, the means are 

^ In his thesis Mossin states that he believes that the given algorithm can be improved. 
In fact an early version of |ir)HM95j contained a O(n^) algorithm for binding-time 
analysis but it was removed from the final version since its correctness turned out 
to be non-trivial (personal communication with Fritz Henglein). 
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not the same. Where we use constraint abstractions they use instantiation con- 
straints, a form of inequality constraints similar to semiunification constraints 
but labelled with an instantiation site and a polarity. They compute the flow 
information from the constraints through an algorithm for Context-Free Lan- 
guage ( CFL ) reachability |Kep97| |MEnni- A key difference between constraint 
abstractions and instantiation constraints is that constraint abstractions offer 
more structure and a notion of local scope whilst in the work by Rehof and 
Fahndrich all variables scope over the entire set of constraints. Our algorithm 
takes advantage of the scoping in an essential way. Firstly, we do not add any 
edges between variables that have no common scope and secondly the scoping 
comes into the restriction of our transitivity rule and the unwind rules. Although 
the scoping does not improve the asymptotic complexity in terms of the size of 
the explicitly typed program it shows up in the more fine-grained complexity 
argument leading to the cubic bound (see section IS. 41) and it is essential for the 
refined data structures we sketch in section 1S.5L Constraint abstractions also 
offer a more subjective advantage - the additional structure of constraint ab- 
stractions enforces many useful properties. As a result we think it will be easy 
to use constraint abstractions in a wide range of type based analyses and we 
think that constraint abstractions will not lead to any additional difficulties 
when establishing the soundness of the analyses. 

We have previously used constraint abstraction to formulate usage-poly- 
morphic usage analyses with usage subtyping ISvenn Ic^ and an algorithm 
closely related to the one in this paper is presented in the second authors mas- 
ters thesis (nsnni focuses on the usage type system and no constraint 

solving is presented). 



5 Conclusions and Future Work 

In this paper we have shown how a constraint language with simple inequality 
constraints over a lattice can be extended with constraint abstractions which 
allow the constraints to compactly express substitution instantiation. The main 
result of this paper is a constraint solving algorithm which computes least so- 
lutions to the extended form of constraints in cubic time. In jCISDOj we have 
used this expressive constraint language to formulate a usage-polymorphic us- 
age analyses with usage subtyping and usage-polymorphic recursion and in an 
appendix to this paper we demonstrate how the extended constraint language 
can be used to yield a cubic algorithm for Mossin’s polymorphic flow analysis 
with flow subtyping and flow polymorphic recursion jMosDTj . We believe that 
our approach can be applied to a number of other type based program analyses 
such as binding time analyses, e.g., PHinsi, points-to-analyses, e.g., 
and uniqueness type systems [IBS96j . 

An interesting possibility for future work is to explore alternative constraint 
solving algorithms. The current algorithm has a rather compositional character 
in that, it rewrites the body of a constraint abstraction without considering how 
it is called. In we describe an algorithm where the different calls to a 
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constraint abstraction lead to rewrites inside the abstraction. The algorithm can 
in this way take advantage of global information (it can be thought of as a form 
of caching) which yields a interesting finer grained complexity characterisation. 
The algorithm in iBm is however restricted to non-recursive constraint ab- 
stractions and it is not clear whether the algorithm can be extended to recursive 
constraint abstractions (although we believe so). Another opportunity for future 
work is to investigate whether constraint abstractions can be a useful extension 
for other underlying constraint languages. Constraint abstraction could also pos- 
sibly be made more powerful by allowing constraint abstractions to be passed as 
parameters to constraint abstractions (i.e., making them higher order). Finally 
a practical comparison with Mossin’s algorithm and the algorithm by Rehof and 
Fahndrich remains to be done. The outcome of such a comparison is not clear 
to us. 
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A Flow Analysis 

In this appendix we illustrate how constraint abstractions can be used in practice. 
As an example, we briefly present a flow-polymorphic type based flow analysis 
with flow-polymorphic recursion. For another example see where con- 

straint abstractions are used in usage analysis. The flow analysis is based on the 
flow analysis by Mossin jMos97j but we use our extended constraint language 
with constraint abstractions. A similar analysis, but without polymorphic re- 
cursion, is given by Faxen jFa,x95j . For simplicity we restrict ourself to a simply 
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typed functional language. To extend the analysis to a language with a Hindley- 
Milner style type system is not difficult. See for example |Ka,xh5| . A key result is 
that the analysis can be implemented in O(n^) time where n is the size of the 
explicitly typed program which is a substantial improvement over the algorithm 
by Mossin which is 0(n®) [Mos!f7| . 

The aim of flow analysis is to statically compute an approximation to the flow 
of values during the execution of a program. To be able to pose flow questions 
we will label subexpressions with unique flow variables. We will label expressions 
in two distinct ways, as a source of flow or as a target of flow. We will use e“ 
as our notation for labelling e (with flow variable a) as a source of flow and Ca 
as our notation for labelling e as a target of flow. If we are interested in the 
flow of values from producers to consumers then we label all program points 
where values are created as sources of flow, we label the points where values are 
destructed as targets of flow, and we leave all other subexpressions unlabelled. 
In the example below we have labelled all values as sources with flow variables 
oo through 04 and we have labelled the arguments to plus as targets with 05 
and og. We have not labelled the others consumers (the applications) to keep 
the example less cluttered. 

let apply = (A/.(Aj/./ 

in let id = (Acc.a;)“^ 

in {apply id 5“=*)a5 + {apply id 7“'‘)ae 

We may now ask the question “which values may show up as arguments to 
plus?”. Our flow analysis will give the answer that the value labelled with 03 
(5) may flow to ag (the first argument) and 04 (7) may flow to ag (the second 
argument). In this example the flow polymorphic types that we assign to id 
and apply plays a crucial role. A monomorphic system would conservatively say 
that both values could flow to both places. For some applications we might be 
interested in, not only the flow from producers to consumers, but also the flow 
to points on the way from a consumer to a producer. In our example we might 
be interested in the flow to x in the body of id. We then add a target label on 
X as in 



let apply = {Xf.{Xy.f y)“o)oi 
in let id = {Xx.Xaj)°‘‘^ 

in {apply id 5“^),^^, + {apply id 7“'‘)oe 

and then ask for the flow to 07. Our analysis would answer with 03 and 04. An 
important property of the analysis is that the type of id remains polymorphic 
even though we tap off the flow passing through x. Thus our type system cor- 
responds to the sticky interpretation of a type derivation in |lVIosf)7| . The key 
to this property is to distinguish between source labels and target labels. If the 
label on x would serve as both a source and a target label the flow through id 
would be monomorphic. 0 



3 



We can achieve this degrading effect by annotating x both as a source and as a target 
but using the same flow variable, i.e., as Xal- 
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The language we consider is a lambda calculus extended with recursive let- 
expressions, integers, lists and case-expressions. The grammar of the language is 
as follows. 



Variables 
Flow Variables 
Expressions 

Bindings 

Alternatives 



x,y,z 

a 

e ::= Xx.e \ n \ nil | cons eo ei | x | eg -I- ei | cq ei | 
let {6} in e | case e of alts | e“ | Ca 
b ::= X = e 

alts ::= {nil eg, cons xy ^ ei} 



The language is simply typed and for our complexity result we assume that the 
terms are explicitly typed by having type annotations attached to every subterm. 
For our flow analysis we label the types of the underlying type system with flow 
variables. 



Flow Types r ::= lnt“ | (r ^ t')“ | (List r)“ 

We will let p range over flow types without the outermost annotation. The 
subtype entailment relation which take the form M \- tq < ti is defined in 
Figure El Recall that M ranges over constraint terms as defined in Section ITTI 
We read MV- tq < ti as “from the constraint term M it can be derived that 
To < Ti”. We will let a range over type schemas. 

Type Schemas a ::= Va. f a ^ t 

Since the underlying type system is monomorphic type schemas will only quan- 
tify over flow variables. A type schema contains a call / a to a constraint abstrac- 
tion which may constrain the quantified variables. We will let 0 and A range 
over typing contexts which associates variables with types or type schemas de- 
pending on whether it is a let-bound variable or not. We will use juxtaposition 
as our notation for combining typing contexts. Our typing judgements take the 
form 0; M h e : T for terms, &■, F V- b : {x : a) for bindings and 0; F \- {b} : A 
for groups of bindings. (Recall that F ranges over constraint abstractions and 
that F ranges over sets of constraint abstractions.) The typing rules of the anal- 
ysis can be seen in Figure Q The key difference to the type system in |Most)7] 
is in the rule Binding where generalisation takes place. Instead of putting the 
constraint term used to type the body of the binding into the type schema the 
constraint term is inserted into a new constraint abstraction and a call to this 
abstraction is included in the type schema. 

To compute the flow in a program we can proceed as follows. First we com- 
pute a principal typing of the program which includes a constraint term where 
the free variables are the flow variables labelling the program. We then apply 
the algorithm from Section El and extract a set of atomic constraints which we 
can view as a graph. If there is a path from og to oi then ao may flow to oi. The 
typing rules as presented here are not syntax directed and cannot directly be 
interpreted as describing an algorithm for computing principal typings. Firstly, 



Constraint Abstractions 



79 



M \- T < T 



T h Int < Int M A (a < a') h (List r)“ < (List r')“ 
M \- Tq < To N \- T\ < t[ M \- po < pi 



M f\ N \- To ^ Ti < Tq ^ M a {a < a') \- Pq < pf 

Fig. 2. Subtyping rules 



,, 0{a; : r}; M h e : r' _ 

12 

&;M\- Xx.e : (r ^ r')“ 0; T h n : Int“ 0; T h nil ; (List r)“ 

^ 0; M h eo : r 0; Ai h ei : (List r)“ 

0; M A N \- cons eo ei : (List r)“ 

0{x : ya.f a => r}; f b\~ x : r[o := b] 0{x : r}; T h a: : r 

0 ; M h eo : Int““ 0 ; h ei : Int“i 

Jy'l^g 

0; M A Ai h eo + ei : Int“ 



0;M h eo : (r ^ r')° 0; Ai h ei : r 

0; M A N \- eo ei : t' 



0A-,r\- {b} : A 0A;Mhe:r 
0; let F in M h let {6} in e : r 



0; M h e : r 0; Ai h alts : t ^ t' 
0; M A N \- case e of alts : t' 



0; M \- eo '■ t' 0{x ’■ T,y : (List r)“}; A^ h ei : r' 

0; M A N \- {nil eo; cons xy ^ ei} : (List r)“ =i> r' 

. 0; M h e : 0; M h e : p° 

M A {a < c) A {b < c) \- e'’ : p‘^ 0; M A {a < c) A {a < b) \- et ’■ p^^ 

Bindinff Kroup-0 Binding group ^ 0, F \- b . {x . a) 

g g P 0; 0 h 0 : 0 ^inamg group ^ (A,x : a) 



Binding 



0; M h e : T 

0-, f a = M \- X = e : {x : Va.f a ^ t) 



{o}nFV(0,/o = M,e) = l 



' Ai h r < r' Exist-intro ^ JT ^ ^ {o} n FV(0, e, r) = I 

0\MAN're:T' ~ 0; 3o.M h e : r v . > ; 



Fig. 3. Typing rules for a flow analysis 
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the subsumption rule (Sub) and the rule (Exist-intro) which introduces existen- 
tial quantification in constraints can be applied everywhere in a typing deriva- 
tion. This problem is solved by the standard approach to incorporate (Sub) and 
(Exists-intro) into an appropriate subset of the other rules to obtain a syntax- 
directed set of rules. Secondly, in the rule (Let) an inference algorithm would 
have to come up with an appropriate A. However, this only amounts to coming 
up with fresh names: Clearly, A would have to contain one type associations of 
the form x : a for each variable defined by the let-expression. Recall that a is 
of the form Va./ a ^ t. We obtain r simply by annotating the underlying type 
with fresh flow variables. Since they are fresh we will be able to generalise over 
all of them so we can take a to be these variables in some order. Finally we gen- 
erate the fresh name / for the constraint abstraction. Note that no fixed-point 
calculation is required which is possible because we have recursive constraint 
abstractions. Now let us apply the algorithm to our example program. We first 
compute the constraint term in the principal typing which yields the following. 

let f apply bo bi 62 bs 64 65 be = 3co.3ci. 302.(63 < bo) A (61 < 64) A (62 < C2)A 

(ci < 65) A (oo < 65) A (co < be) A (oi < be) 

in let fid bo 61 62 = 3 co. 3 ci.( 6 q < ci) A (ci < bi) A (ci < 07)A 

(co < 62) A (02 < 62) 

in 3 cq. • . . apply cq ci C2 C3 C4 C5 ce) A i^fid cq ci C2)A 

(C7 < C3) A (03 < C3) A (C4 < Cs) A (C4 < 05)A 
{/apply Cio Cii C12 Ci3 C14 C15 Cie) A {fid CiQ Cn Ci 2)A 
(ci7 < C13) A (03 < C13) A (ci4 < Cis) A (ci4 < 05) 

Then we apply the algorithm from Section 0 and extract the set of free live 
atomic constraints which is {03 < 05, 04 < ae, as < 07, 04 < 07}. The paths in 
this constraint set (viewed as a graph) is the result of the analysis. 

Finally, by inspecting the rules we can see that the size of the skeleton of 
the constraint term required to type a program is proportional to the size of the 
explicitly typed program and that the number of free variables is the number 
of flow variables in the program. From this fact and theorem El we can conclude 
that the complexity of the flow analysis is 0 {n^) where n is the size of the typed 
program. 

B Proof of Theorem m 

In this appendix we give a proof of Theorem 0 We first introduce a form of 
constraint term contexts, reminiscent of evaluation contexts used in operational 
semantics, where the hole may not occur under any binder. 

Evaluation Contexts E ::= [•] \ EAM\MAE 

Note that the hole in an evaluation context is always live. We have the following 
properties for evaluation contexts which we state without proof. 

Lemma 6. 1 . let T ±n E[let E' ±n M] is in normal form iff let E E' ±n E[M] 

is in normal form. 
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2. LIVE{let r in E[let E' in M]) = LIVE{let EE' in E[M]). 

3. If a ^ FV{E, E), and let/^ ini?[3a.M] is in normal form then letE ±nE[M] 
is in normal form. 

The key to the proof of Theorem Q is the following auxiliary relation. 

Definition 8. We define an auxiliary relation 9\E M as: 

6\ E • 1= M iff there exists E such that 

L let E in i?[M] is in normal form, 

2. LIVE{let E in E[M]), 

3. FAV{letEinE[M]) = %. 

The technical core of the proof now shows up in the proof of the following lemma. 



Lemma 7. if 9\E u\= M then 9-,E\=M. 

Before we proceed with the proof of this lemma we will use it to establish The- 
orem ^ 

Proof (Theorem^. Assume the premise. The right way implication (if 9\%\= M 
then 9 |= LIVE(M)) follows the fact that all syntactically live constraints are 
semantically live. To show the left way implication (if 9 ^ LIVE(M) then 9\% \= 
M) assume that 9 |= LIVE(M) which immediately gives 9]% •]= M. Thus, by 
Lemma 0 9',% \= M as required. 

Finally we prove Lemma 0 

Proof (Lemma W- Recall that 9; E \= M is defined coinductively by the rules 
in Figure 0 That is, \= is defined as the largest fixed point of the functional F 
expressed by the rules. By the coinduction principle we can show that •!= C (= 
if we can show that C Thus we assume that 9\E u\= M and proceed 

by case analysis on the structure of M. 

case M = a < b: By the definition of 9; E u\= a < b there exists E which fulfils 
the requirements in Definition 0 In particular, 9 ^ LIVE(let E in E[a < 6]). 
Since E cannot capture variables and the hole in E is live we know that a < b G 
LIVE(let E in E[M]) so 9; E F{»\=) a < b. 



case M = T: Trivial. 

case M = K A L: To show that 9; E 1F(*|=) K A L we need to show that 
9; E u\= K and 9;E u\= L. We will only show the former, the latter follows 
symmetrically. By the definition of 9; E u\= K A L there exists E which fulfils 
the requirements in Definition^ Take E' to be A L], Then E' is a witness 
of K. 
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case M = let F' in N: We may without loss of generality (due to properties 
of a-conversion) assume that the constraint abstraction variables defined F and 
F' are disjoint. To show that 6',F 1F(*|=) let F' in N we need to show that 
9\FF' •[= N . By the definition oiO\F letT'inM there exists E which fulfils 
the requirements in Definition 0 Floating of let bindings preserves normal forms 
(Lemma|^ so we can float out F' and obtain let FF' in i?[M] in normal form. 
Also, by Lemma|3 LIVE(let F inF[let F' inM]) = LIVE(let FF' inF[M]). 
Thus F is a witness oi F F' »\= M. 

case M = f b: By the definition of 9; F / 6 we know that / must bound 
by F, i.e., F = F'{f a = N} for some F' and some N. We are required to show 
that 9; F'{f a = N} N[a := b]. From 9; F *|= f b we know that there exists 
E which fulfils the requirements in Definition 0 Normal forms are closed under 
unwindings (Lemma 0) so let F'{f a — N} in E[N[a 6]] is in normal form. 
Also, by Lemma 0 

LIVE(let F'{f a = iV} in E[f 6]) = LIVE(let F'{f a = N} In E[N[a := b]]). 
Thus E is a witness of 9; F'{f a = N} *|= N[a := b], 

case M = 3a. N To show that 9; F F(*|=) 3a. fV we need to show that there 
exists d G £ such that 9[a := d\;F iV. Let 

d = I |{6*(a') \ a' ^ a and a' < a G LIVE(A^)}. 

By the definition of 9; F *|= 3a. there exists E which fulfils the requirements 
in Definition 0 Without loss of generality (due to properties of a-conversion) 
we can assume that a ^ FV{F,E). Since let F in F[3a.iV] is in normal form, 
and a ^ FV{F,E) we know, by Lemma 0, that let F in E[N] is in normal 
form. It remains to show that 9[a := d] ^ LIVE(let F in E[N]). Given A G 
LIVE(let F inF[A^]) we proceed by the following cases. 

subcase A = a < a: Trivial. 

subcase A = b < c where b ^ a and c ^ a: 

In this case A G LIVE(let F in F[3a.A^]) so 0 ^ A and thus 9[a := d\ \= A. 

subcase A = b < a where b ^ a : In this case A G LIVE(TV) and thus 
9\a '.= d\ ^ A by the construction of d. 

subcase A = a < b and b ^ a: In this case a < b G LIVE(TV). We will show 
that 9{b) is an upper bound of 

{9{a') I aV a and a' < a G LIVE(iV)} 

and, since d is defined as the least upper upper bound, 9[a := d] \= a < b 
follows. Now given any a' such that a' ^ a and a' < a G LIVE(iV). Since 
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a' < a G LIVE(iV) and a < b G LIVE(iV) we know that let F in E[3a.N] 
letE inE[3a.iV Aa' < &] and since letE inE[3a.7V] is in normal form we know 
that 



LIVE(let r in E[3a.N A a' < 6]) = LIVE(let F in E[3a.N]). 

Finally, since a' yf a it must be the case that a' < b £ LIVE(let F in E[3a.N A 
a' < b]) and thus a' < b £ LIVE(let F in E[3a.Ai]). Hence 0 ^ a' < 6 so 
0{a') E 0{b). 
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Abstract. Complex data dependencies can often be expressed concisely 
by dehning a variable in terms of part of its own value. Such a circular 
reference can be naturally expressed in a lazy functional language or in 
an attribute grammar. In this paper, we consider circular references in 
the context of an imperative C-like language, by extending the language 
with a new construct, persistent variables. We show that an extension of 
partial evaluation can eliminate persistent variables, producing a staged 
C program. This approach has been implemented in the Tempo special- 
izer for C programs, and has proven useful in the implementation of 
run-time specialization. 



1 Introduction 

In compilation and program transformation, the treatment of a subcomponent 
of a block of code often depends on some global properties of the code itself. A 
compiler needs to know whether the source program ever uses the address of a 
local variable, to decide whether the variable must be allocated on the stack Q- 
A partial evaluator needs to know whether a dynamic (but non-side-effecting) 
expression is referred to multiple times, to decide whether the expression should 
be named using a let expression EE3- In the context of run-time specialization, 
we have found that optimizing the specialized code based on its total size and 
register usage can significantly improve its performance Such programs can 
often be efficiently implemented as multiple phases, where early phases collect 
information and later phases perform the transformation. This organization, 
however, distributes the treatment of each subcomponent of the input across the 
phases, which can introduce redundancy, and complicate program understanding 
and maintenance. 

In a lazy language, we can implement these examples in a single traversal 
using recursive declarations, as investigated by Bird P|. The canonical example 
of such a circular program is “repmin,” which reconstructs a tree, such that the 
value at each leaf is the least value at any leaf in the original tree. While repmin 

* This research was partially supported by the Danish Natural Science Research Coun- 
cil (PiT project). 
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can be implemented by first traversing the tree to detect the least value, and 
then traversing the tree again to construct the result, the program can also be 
expressed in a lazy language as follows Pj : 

data Tree = Tip Int I Fork Tree Tree 
rm t = fst p 

where p = repmin t (snd p) 

repmin (Tip n) m = (Tip m, n) 

repmin (Fork L R) m = (Fork tl t2, min ml m2) 
where (tl, ml) = repmin L m 
and (t2, m2) = repmin R m 

The variable p in rm represents a value that is the result of the traversal of the 
tree, but that is used in computing that result as well. Here, a lazy evaluation 
strategy suffices to order the computations such that the components of p are 
determined before they are used. 

Nevertheless, the use of a lazy language is not always appropriate. To resolve 
this dilemma, we propose a language extension, persistent variables, that can 
describe circular references in an imperative language. Using this facility, we can 
implement repmin imperatively as follows, where the persistent variable minval 
implements the circular reference to p in the functional implement ationQ 



typedef struct ans { 
int mn; 

Tree *tree; 

} Ans ; 

Tree *rm(Tree *t) { 

persistent int minval; 

Ans a; 

repmin(t, pread (minval) , &a) ; 
pwrite (minval , a . mn) ; 
return a. tree; } 

} > 



void repmin(Tree *t , int m, Ans *a) { 

Ans al , a2 ; 

if (t->type == Fork) { 

repmin (t->left , m, &al) ; 
repmin (t->right , m, &a2) ; 
a->mn = min(al .mn, a2 .mn) ; 
a->tree = mkFork(al . tree, a2 . tree) ; 

} 

else /* (t->type == Tip) */ { 
a->mn = t->tipval; 
a->tree = mkTip(m); 



This implementation uses pread (“persistent read”) to reference the final value 
to which minval is initialized using pwrite (“persistent write”). To execute the 
program, we must first transform it such that it initializes minval before any 
reference. In this paper, we show how to use partial-evaluation technology to 
perform this transformation. 

Traditionally, partial evaluation specializes a program with respect to a sub- 
set of its inputs. The program is evaluated in two stages: The first stage per- 
forms the statie computations, which depend only on the known input. When 

^ The structure Ans is used to return multiple values from the function repmin. 
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the rest of the input is available, the second stage evaluates the remaining dy- 
namic computations, producing the same result as the original program. With 
minor extensions, we can use this framework to eliminate circularity by simply 
considering computations that depend on the persistent variables to be dynamic, 
and the other computations to be static. For example, in the above implemen- 
tation of repmin, the construction of each leaf of the output tree depends on 
the value of the persistent variable minval, via the parameter m of repmin, and 
is thus dynamic. The calculation of the least value in the tree depends only on 
the input tree, and is thus static. The staging performed by partial evaluation 
permits the persistent variables that only depend on static information to be 
initialized in the static phase and read in the dynamic phase. We have imple- 
mented this approach in the Tempo partial evaluator for C programs, developed 
in the Compose group at IRIS A p]. 

This implementation of circularity leads naturally to incremental special- 
ization riTTI : if the value of a persistent variable depends on that of another 
persistent variable, partial evaluation must be iterated. If there are recursive 
dependencies among the persistent variables, however, the program cannot be 
treated by our approach (c/. Section |E|). 

The rest of this paper is organized as follows. Section 0 describes partial 
evaluation in more detail. Section 0 presents the implementation of persistent 
variables in the context of a partial evaluator for a simple imperative language. 
Section E] gives a semantics of the language with persistent variables and shows 
that partial evaluation of a program preserves its semantics. Section 0 compares 
our partial evaluation-based approach to related techniques in the implemen- 
tation of attribute grammars. Section El provides some examples of the use of 
persistent variables. Section 0 describes related work, and Section 0concludes. 

2 Specialization Using Partial Evaluation 

Partial evaluation uses interprocedural constant propagation to specialize a pro- 
gram with respect to some of its inputs. We use ojfline partial evaluation, 
in which each expression is annotated as static or dynamic by a preliminary 
binding-time analysis phase. Two kinds of specialization can be performed in 
this framework: program specialization and data specialization. Program special- 
ization transforms a program into an optimized implementation based on the 
results of evaluating the static subexpressions H3. Static subexpressions are 
replaced by their values, static conditionals are reduced, and static loops are 
unrolled. Data specialization separates a program into two stages, known as the 
loader and the reader, before the static data is available 0. The loader stores 
the values of the static subexpressions in a data structure known as a cache. The 
reader has the form of the source program, but with the static subexpressions 
replaced by cache references. Because the loader and reader are independent of 
the static data, conditionals are not reduced and loops are not unrolled. 

We implement persistent variables in the context of data specialization. Per- 
sistent variables and data specialization fit together well, because the data spe- 
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cialization cache is a natural means to transmit the values of persistent variables 
from the static phase to the dynamic phase. 

3 Data Specialization and Persistent Variables 

We now define data specialization for a simple imperative language with persis- 
tent variables. Treating a richer language is straightforward; the implementation 
allows the full subset of C accepted by Tempo 0 , including pointers and recur- 
sive functions. 

3.1 Source Language 

A program consists of declarations d and a statement s, defined as follows: 

d G Declaration ::= int x | persistent int p 
s G Statement ::= x = e | pwrite (p , e) | if (e) Si else S 2 
I while (e ) s \ {si ; . . . ; s„> 
e G Expression ::= c | x | ei ope 2 | pread(p) 

X G Variable 

p G Persistent variable Variable H Persistent variable = 0 

A persistent variable can only appear as the first argument of pread or pwrite. 
Thus, a persistent variable is essentially a label, rather than a first-class value. 

3.2 Binding-Time Annotation 

Binding times are static, S, and dynamic, D, where S IZ D. The language of 
binding-time annotated declarations d, statements s, and expressions e is defined 
as follows: 

b G Binding time = {S, D} 

d G BT- Declaration ::= int x^ | persistent int p^ 
s £ BT-Statement ::=x=^e^ | pwrite (p^ , ) | if (e^) si else S 2 

I while (e^) s I {si ; . . . ; s„> 
e G BT-Expression ::= c | x | ope ^2 I pread(p^) 

Figure n presents inference rules that specify binding-time annotations for a 
program, based on an environment E mapping variables and persistent variables 
to binding times. In the annotation of a program, P{d) represents the binding 
time associated by E to the variable declared by d. The annotation of a statement 
s is described by E, bc\~s s : s, where the binding time be is the least upper bound 
of the binding-times of the enclosing conditional and loop tests. The annotation 
of an expression e is described by T he e : e^. Annotations can be automatically 
inferred using standard techniques HH. 

The rules of Figure 0] treat statements and expressions as follows. The an- 
notation of an assignment statement is determined by the binding time of the 
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Programs: 

r,s \~s s ■. s 

rhp 

Statements: 

r[x I— > &] he e : e*” 6 □ fee LI 6' P[p fe] he e : e*" fe □ fee U 6' 

r[x I— > fe], fee hs X = e : X =*" e*" P[p fe], fee hs pwrite (p, e) : pwriteCp^ , ) 

P he e : e*’ P, fee U 6 hs si : si P, fee U fe hs S 2 : S 2 
r, fee hs if (e) Si else S2 : if (e'’) si else S2 

r he e : e** P, fee U fe hs s : s F, fee hs si : si ... F, fee hs Sn : Sn 
F, fee hs while (e) s : while (e^) s F, be hs {si ; . . . ; Sn} : ■fsi ; ... ; Sn } 

Expressions: 

r he c : F[x fe] he X : x*” r he ei : F he 62 : 

E[p fe] he pread(p) : pread(p^)'^ ^ op 62 '■ op €2) 

Fig. 1. Binding-time analysis 

assigned variable, which must be greater than or equal to the binding time of the 
right-hand side expression and the binding times of the enclosing conditional and 
loop tests (fee)- The annotation of a pwrite statement is similarly constrained. 
In the annotation of a conditional statement, the least upper bound of fee and the 
binding time of the test expression is propagated to the analysis of the branches. 
The annotation of a loop is similar. The result of a pread expression is always 
dynamic; the binding time of its argument is obtained from the environment. 
The treatment of the other constructs is straightforward. 

This analysis is flow-insensitive and does not allow static assignments under 
dynamic conditionals and loops. These restrictions can be removed for ordinary 
variables by existing techniques j0|. Nevertheless, persistent variables must be 
flow-insensitive, to ensure that every assignment to a persistent variable occurs 
before any access. An implementation strategy is to perform the binding-time 
analysis in two phases. The first phase annotates all expressions except persis- 
tent variables, using a flow-sensitive analysis. The second phase annotates each 
persistent variable with the least upper bound of the binding times of all pwrite 
statements at which it is assigned. This second phase does not affect the bind- 
ing times of other terms, because the binding time of a pread expression is 
dynamic, independent of the binding-time of the associated persistent variable, 
and a pwrite statement has no return value. 

3.3 Data Specialization 

Data specialization stages the source program into a loader and a reader that 
communicate via a cache, which we represent as an array. We thus extend the 
language with constructs for manipulating cache elements: 
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s G Statement ::= . . . | {Cachex;s} | *ei =62 
e G Expression | *e 

X G Variable U {cache, tmp} 

The statement {Cache x;s} is a block that declares a pointer into the cache. 
Two such pointers are cache, which is initialized to an external array, and tmp, 
which is used in the translation of dynamic conditionals and loops. The indi- 
rect assignment *ei = 62 and the indirect reference *e initialize and access cache 
elements, respectively. In practice, bounds checks and casts into and out of a 
generic cache-element type are added as needed. We refer to this language as 
the target language, and the sublanguage of Section mi as the source language. 

Figure El presents the transformation rules for data specialization of state- 
ments and expressions. The transformation of a statement is described by i hj 
s : {l,r,i'), where i is the offset from cache of the next free cache entry, I is 
a statement representing s in the loader, r is a statement representing s in the 
reader, and F is the offset from cache of the next free cache entry after executing 
either I or r. By keeping track of the cache offset i, we reduce the number of as- 
signments to the cache pointer. The transformation of an expression is described 
by i Fj : {l,v,r,i'), where i and i' represent cache offsets as for statements, I 
is a statement initializing the cache with the values of the static subexpressions 
of e^, V is an expression to use in the loader to refer to the static value of 
if it has one, and r is an expression representing the value of for use in the 
reader. In the definition of the transformation, e is the result of removing all 
binding-time annotations from the annotated expression e^. 

The transformation treats statements and expressions as follows. We use 
cache entries to implement static persistent variables. Thus, pwriteCp^, e^) is 
translated into an indirect assignment to the entry allocated to p. This assign- 
ment is placed in the loader. Similarly, pread(p^)'^ is translated into an indi- 
rect reference to the corresponding entry. This reference is placed in the reader. 
References and assignments to a dynamic persistent variable are placed in the 
reader. The treatment of the remaining constructs is standard jf)l 1 4 ] . We include 
the translation of static conditionals here to give a flavor of the adjustments to 
the cache pointer needed to implement branching control flow constructs. The 
complete treatment of conditionals and while loops is presented in Appendix El 
We conclude with ds|p]r, the transformation of a program p with respect 
to a binding-time environment E. Let . . . d!^s be the result of annotating p 
with respect to E. Suppose that the first m declarations of p declare the static 
persistent variables p^, . . . , p^. If m s : {l,r,i'), then ds|p]r is: 

dm + l • ■ • dji 

{ 

Cache cache, Pj^ , ..., p^; 

cache = cache_start; p^ = cache; ... ; p„, = cache + m - 1; 

1 ; cache = cache_start; r 

} 
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Statements: 

i l“d x=^e'’ : (x = e,-Q,i) i pwriteCp^, e'’) : (*p =e,{},i) 

i h| e'’ : (Z, u, r, i') ^ l-| e*’ : (l,v,r,i') 

i X e*" : (l,x = r, i') i pwrite (p^ , e*") : (I, pwrite (p, r) ,i') 

i\-^ •. {l,v,r,i') i' si ■. {h,ri,ii} i' h ■ {h,r2,i2) 

i if (e^) Si else S2 '■ 

{{.I ; if (u) {Zi ; cache = cache+iiJ else {.I 2 ; cache = cache+i 2 }J, 
if (r) {ri ; cache = cache+iiJ else {.r 2 ; cache = cache+Z 2 }, 

0) 

... 'I'nj in') 

{sji; ... ;s^} : {-CZi ; . . . ; L>, {ri ; . . . ;r„}, i„) 

Expressions: 

i hj : (*(cache+i) = e, *(cache+i), *(cache+i), i + 1) i x'^ : {{}, 0, x, i) 

i h| pread(p^)^ : {{}-,0,*p,i) i h| pread(p°)° : {{}, 0, pread(p) , i) 

i h| : {li,vi,ri,ii} ii h| : (^2, U2, r~2, ^2) 
i l-| ope^^)^ : ({/i ; Z 2 >, 0 , n opr 2 ,i 2 ) 

Fig. 2. Data specialization of statements and expressions (excerpts) 



The generated program first initializes cache to the beginning of the cache and 
the static persistent variables to cache entries. Next the loader I is executed. 
Finally, the value of cache is reset, and the reader r is executed. 



4 Correctness 

We now relate the semantics of the result of data specialization to the semantics 
of the source program. We begin with the semantics of the target language, which 
is a superset of the source language. The semantics depends on a store a mapping 
locations to values. For conciseness, we implicitly associate each variable x with 
the store location x. The semantics of a statement s is specified by tr h® s : a', for 
stores a and a' . The semantics of an expression e is specified by cr h® e : u, where 
cr is a store and u is a value. We only describe the semantics of the constructs 
manipulating persistent variables; the other constructs are standard, and are 
deferred to Appendix lEl 

The semantics must ensure that every reference to a persistent variable using 
pread sees the final value to which the variable is assigned using pwrite. We 
use two distinct store locations p'" and p°“* to represent each persistent variable 
p. The location p™ holds the final value of p, while the location p°^^* records 
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updates to p. Thus, the semantics of pread and pwrite are as follows, where 
undefined is some value distinct from the value of any expression: 

V undefined a\~l e : v 

ct[p“ v] h® pread (p) : v a pwrite (p, e) : cr[p°“’^ -y] 

The values stored in p“ and p°^^* are connected by the semantics of a complete 
program, specified as follows: 

Definition 1. Let p he a program d\ . . .dnS declaring the variables Xi, . . . , Xy 
and the persistent variables pj^, . . . , p^. Let ctq be a state mapping the Xi to the 
initial values of the Xi, and the p°“* to “undefined”. Then, the meaning of p, |p], 
is the set of stores a, binding only the Xi, such that for some values v\, . . . ,Vq 

CTo U {p“ I 1 < j < (?} b® s : cr U {p“, p°"* ^ Vj \ 1 < j < q} 

This definition uses the value undefined to ensure that the meaning of a program 
represents computations in which the value of a persistent variable is only read 
when it is also defined. 

We now show that data specialization preserves the semantics. Data spe- 
cialization separates a program into the loader, which only manipulates static 
variables, and the reader, which only manipulates dynamic variables. To relate 
the semantics of the loader and reader to the semantics of the source program, 
we first define the operators stat and dyn, which separate the input store into 
static and dynamic components: 

Definition 2. Let a he a store and T be a binding-time environment. Let p°“* 
and p'" he locations that are unique for each persistent variable p, but that may 
he identical to each other. Then, 

1. stat{a,r) = {x I— > cr(x) | T(x) = S} U {p p°'^*, p°'^* i— > cr(p°“*) | T(p) = S} 

2. dyn(CT, T) = {x ct(x) | T{x) = D} U {p p'”, p“ I = S} U 

{r - a(n, p°"* ^ a(p°"‘) I T(p) = D} 

To relate stores stat(cr, T) and dyn(CT, T) back to a, we must eliminate the inter- 
mediate locations p“ and p°'^‘. We thus define the operator 1+1: 

Definition 3. For binding-time environment F and stores a and (3, 

a l+lr /3 ={x ++■ a(x) | F(x) = 5} U {x >-+■ /3(x) | F(x) = D} 

U {p“ ^ /3(p“), p°"* ^ a(p°"*) I F(p) = 5} 

U {p“ - /3(P“), P°"* - I r(p) = D} 

The operators stat, dyn, and l±) are related by the following lemma: 

Lemma 1. stat(cr, F) l+l/- dyn(cr, F) = a 

The loader and reader also use some store locations not present in the store 
used by the source programs, namely the cache pointer, the local tmp variables, 
and the cache. For conciseness, we specify the store with respect to which the 
loader and reader are executed as a sequence a,c,r,f of 
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1. The bindings (a) associated with the source variables. 

2. The cache pointer (c). 

3. A stack (r) representing the values of the locally declared tmp variables. 

4. The cache (^). 

The following theorem shows that given the loader and reader associated with 
a source statement, execution of the loader followed by resetting of the cache 
pointer followed by execution of the reader has the same effect on the values of 
the source variables as execution of the source statement. 

Theorem 1. For any statement s and store a, if F^bc \~s s ■ s and i s : 

{l,r,i'), and if for some c, t, and there are a, (3, d , t' , and ff sueh that 

1. stat(cr, F),c,T,f,dll : a, d, r', C- 

2. dyn(CT,T), l-| r : (3,d,T',ff. 

Then, cr h® s : a l±) /3. 

Because of speculative evaluation of terms under dynamic control (see Ap- 
pendic termination of the source statement does not necessarily imply ter- 
mination of the loader. Thus, we relate the semantics of the source program to 
that of the loader and reader using the following theorem, which includes the 
hypothesis that the loader terminates. 

Theorem 2. For any statement s and store a, if F,bc \~s s : s and * s : 
{l,T,i'), and there are some a' , c, t, and ^ such that 

1. adls:a' 

2. For some store o" , stat{a,F), I : u" 

Then, there are some a, (3, d , t' , and £f , sueh that 

1. stat(cr, F),c,T,f,\~ll : a, d, r', C- 

2. dyn{a,F),c,T’,C r : /3,d,T',C- 

3 . <t' = a l±l /3 

Both theorems are proved by induction on the height of the derivation of 
F, beds s s. These theorems imply the following, which shows that the store 
resulting from execution of the source statement and the store resulting from 
execution of the result of data specialization agree on the variables x, which 
determine the semantics of a program as specified by Definition Q 

Corollary 1. For any program p = d\ . . .dn s, binding-time environment F , 
and store a, i/ds|p]r = d'l . . . Sds, o' h® s : a' , and a h® Sds ■ <y'ds> then for all 
variables x of p, <t'(x) = <t(jj,(x). 

The corollary holds because Sds initially sets up a store that is compatible with 
the store assumed by the above theorems and resets the cache pointer between 
the execution of the loader and the reader. 
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5 Circularity and Attribute Grammars 



The efficient implementation of circular specifications has been studied exten- 
sively in the attribute-grammar community |l 311 SI I ]^rz[WZ I j . As is the case for 
offline partial evaluation, these approaches begin with a dependency analysis, the 
results of which then guide code generation. Nevertheless, the partial-evaluation 
and attribute-grammar-based approaches differ in their starting point and in the 
quantity of information collected by the analyses. Our approach also differs from 
attribute-grammar-based approaches in that our source language is imperative. 

The starting point of an attribute-grammar-based approach is an attribute 
grammar describing the input of the program to be generated. An analysis de- 
termines the dependencies between the attributes, and uses this information 
to construct a series of schedules, known as a visit sequences, of the attribute 
computations such that each computation depends only on previously-computed 
attributes. Each element of a visit sequence can be implemented as a function 
whose argument is the component of the input for which the corresponding at- 
tribute is to be computed. The resulting implementation amounts to a series 
of recursive traversals of the input structure [E|. Specific techniques have been 
devised to transmit intermediate values (“bindings” m) that are not part of 
the input from one phase to the next, and to create specialized input structures 
(“visit trees” pazu) that eliminate the need to maintain portions of the input 
that are not needed by subsequent phases. 

In contrast, the starting point of data specialization is the program itself; 
data specialization is independent of the structure of the input data. The con- 
struction of both glue and visit trees is subsumed by the basic mechanism of data 
specialization: the caching of the value of every static expression that occurs in 
a dynamic context. The cache can, nevertheless, be viewed as implementing a 
specialized visit tree. The cached values of static conditional tests correspond to 
sum-type tags, while the values cached for the chosen branch of a static condi- 
tional correspond to the components of the sum element. In our implementation, 
this tree structure is flattened; pointers from one cache entry to another are only 
introduced to implement speculative evaluation of dynamic control constructs 

(See Appendix E)- 

The binding-time analysis used by data specialization is significantly less in- 
formative than the analysis used in the implementation of attribute grammars. 
We have seen that binding-time analysis simply classifies each expression as 
static or dynamic, according to the classification of the terms that it depends 
on. This strategy implies that a persistent variable that depends on the value of 
another instance of itself is considered dynamic. The attribute-grammar anal- 
ysis collects complete information about the relationships between attributes. 
This extra precision allows fewer dependencies to be considered recursive. The 
impact of replacing the binding-time analysis of data specialization by the more 
informative attribute-grammar analysis is a promising area for future research. 

Finally, we have presented an implementation of circularity in the context of 
an imperative language, whereas attribute grammars use a declarative notation, 
with some similarity to a lazy functional language m- Because imperative Ian- 
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guages are flow sensitive, we have to indicate explicitly which variables should 
be considered persistent, and thus take on the flnal value to which they are 
assigned, rather than the most recent value. In contrast, in a flow-insensitive 
language, such as a declarative or functional language, a variable has only one 
value, which is its flnal value. The addition of circularity to such a language 
simply allows the value to be specified after the point of reference, rather than 
requiring that it be specified before, and no keyword is required. 

6 Examples 

We now present some examples: the translation of the imperative definition of 
repmin, a use of our approach in the implementation of run-time specialization 
in Tempo, and two examples from the literature on circular programs. 

6.1 Repmin 

Figure Q shows the result of applying data specialization to the imperative im- 
plementation of repmin presented in Section ITfl The loader, comprised of rm_ldr 
and repmin_ldr, accumulates the minimum value at any Tip. Once the complete 
tree has been analyzed, this value is stored in the cache location assigned to the 
persistent variable minval. The reader, comprised of rmjrdr and repminjrdr, 
uses this value to reconstruct the tree. 

In the implementation of Figure 0 calls to the primitives mkTip and mkFork 
are considered to be dynamic, and thus placed in the reader. It is also possible 
to consider these calls to be static, in which case the output structure is built 
in the loader. Following this strategy, the reader recursively visits the tips, in- 
stantiating the value of each to be the minimum value. When part of the output 
structure does not depend on the values of persistent variables, this strategy im- 
plies that the binding-time analysis considers its construction to be completely 
static, which can reduce the amount of data stored in the cache. 

6.2 Inlining 

This work was motivated by the problem of optimizing run-time specialization 
in Tempo H5|. An important optimization is the inlining of specialized functions. 
Inlining is performed during the execution of a dedicated specializer {generating 
extension CH) written in C, and is thus most naturally implemented in C as well. 
We have found that the use of persistent variables facilitates the implementation 
of various inlining strategies, by requiring only local changes that do not affect 
the overall implementation of run-time specialization. 

To achieve good performance, the size of a specialized function should not 
exceed the size of the instruction cache or the distance expressible by a relative 

^ The code produced by Tempo has been slightly modified for readability. Among 
these simplifications, we exploit the fact that only integers are stored in the cache 
to eliminate casts to and from a generic cache type. 
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void rm_ldr(Tree *t , int *cache) { 
Ans a; 

int *minval_ptr = cache++; 
cache = 

repmin_ldr (t , &a, cache); 
*minval_ptr = a.mn; 

> 

int *repmin_ldr (Tree *t , Ans *a, 
int *cache) {. 

Ans al , a2 ; 

♦cache = (t->type == Fork) ; 
if (♦cache++) ■{ 
cache = 

repmin_ldr (t->left , &al , cache); 
cache = 

repmin_ldr(t->right, &a2, cache); 
a->mn = min(al .mn, a2.mn); 

} 

else a->mn = t->tipval; 
return cache ; 

> 

Tree ♦rm(Tree ♦t) {. 

int ♦cache = mkCacheO ; 
rm_ldr(t, cache); 
return rm_rdr (cache) ; 



Tree ♦rm_rdr (int ♦cache) { 

Ans a; 

int minval = ♦cache++; 
cache = 

repmin_rdr (minval , &a, cache); 
return a. tree; 

} 

int ♦repmin_rdr (int m, Ans ♦a, 
int ♦cache) 

Ans al , a2 ; 
if (♦cache++) ■[ 
cache = 

repmin_rdr (m, &al , cache); 
cache = 

repmin_rdr (m, &a2, cache); 
a->tree = 

mkFork(al . tree , a2.tree); 

} 

else a->tree = mkTip(m) ; 
return cache ; 

} 



Fig. 3. Data specialization of repmin 



branch instruction. One approach is to constrain inlining based on the number of 
instructions already generated for the current function. A more precise approach 
is to constrain inlining based on the size of the complete specialized function. To 
implement these strategies, we use data specialization and persistent variables 
to separate the implementation into a pass that analyzes the size, followed by a 
pass that performs the inlining and code generation. 



Inlining Based on Current Function Size: The heart of the implementation 
is the function do_call, shown with binding-time annotations in Figure 0 The 
arguments to do_call are the number of instructions already generated for the 
caller (caller_size), the name of the callee (callee), and the buffer into which 
code for the caller is generated (caller_output). The treatment of a call pro- 
ceeds in three steps. First, callee_output is initialized to the address at which 
to generate code for the specialized callee. If the specialized callee is to be inlined, 
callee_output is set to the current position in the caller’s buffer, as indicated 
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extern Code *the_program [] ; 
extern int threshold; 

int do_call(int caller_size, int callee, Code *caller_output) {. 
int inlined, callee_size; 
persistent int inlinedp, callee_sizep; 

Code *callee_output ; 



/* select the output buffer based on whether the call is inlined */ 
if (pread( inlinedp) ) 

callee_output = caller_output ; 
else 

callee_output = mkFun(pread(callee_sizep) ) ; 



/* specialize the callee */ 

callee_size = spec (the_program [callee] , callee_output) ; 



/* initializations based on whether the call is inlined */ 
inlined = (callee_size + caller_size <= threshold) ; 
pwrite (inlinedp, inlined) ; 
if (inlined) 

/* return the number of instructions added to the caller */ 
return callee_size; 
else { 

/* end the callee */ 

* (callee_output + callee_size) = RETURN; 

/* record the callee’s size */ 
pwrite (callee_sizep, callee_size+l) ; 

/* add a call instruction to the caller */ 

♦output = mkCall(get_name (callee_output) ) ; 

/* return the number of instructions added to the caller */ 
return 1 ; 

> 

> 

Fig. 4. The do_call function used in the implementation of inlining based on 
current function size. Dynamic constructs are underlined. 



by caller_output. Otherwise, mkFun is used to allocate a new buffer that is 
the size of the specialized callee. Because the decision of whether to inline and 
the size of the specialized callee are not known until after specialization of the 
callee, this information is implemented using the persistent variables inlinedp 
and callee_sizep, respectively. Next, the callee is specialized by applying the 
function spec to the callee’s source code and the selected output buffer. Special- 
ization emits code in the output buffer and returns the size of the specialized 
function. Finally, the number of instructions already generated for the caller is 
combined with the size of the specialized callee to determine whether the callee 
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should be inlined. The persistent variables are then initialized accordingly, and 
other initializations are performed as indicated by the comments in Figure 0 
The return value is the number of instructions added to the caller. 

The loader produced by data specialization computes the size of each gener- 
ated function and determines whether it should be inlined. The reader then uses 
this information to allocate the output buffer and perform code generation. 



Controlling Inlining Based on Maximum Possible Function Size: The 

previous approach takes into account only the number of instructions generated 
for the caller so far. A more accurate approach is to consider the caller’s total 
size. For this purpose we add a new persistent variable local_sizep recording 
the total size of the current function before inlining. 

The use of the dynamic value of the persistent variable local_sizep to deter- 
mine whether to inline implies that the value of the persistent variable inlinedp 
depends on dynamic information. Thus, we must iterate data specialization, pro- 
ducing a three-phase implementation. The first phase calculates the number of 
instructions generated for each source function, if no inlining is performed. The 
second phase decides whether to inline each call, based on the sum of the number 
of instructions in the specialized caller and the number of instructions added by 
inlining all selected calls. The third phase performs the actual code generation. 



6.3 Other Circular Programs 



We now consider several examples from the literature on attribute-grammar- 
based and lazy implementations of circular programs. These examples illustrate 
the limitations of an approach based on binding-time analysis. 

The Block language has been used by Saraiva et al. to illustrate numer- 
ous strategies for generating efficient implementations of attribute grammars 
jl 9l21)i‘22| . Block is a block-structured language in which the scope of a vari- 
able declaration is any nested block, including nested blocks to the left of the 
declaration. The language thus generalizes the common use of forward references 
to top-level functions, extending this facility to local variables. The scoping rule 
is illustrated by the following program, in which braces delimit a nested block, 
“int x” is a declaration of x, and 



{ { X } int X } 



We focus on one of the compilation problems that has been studied for Block, 
that of translating Block code to a stack-based language m Circularity arises 
when the compiler needs to determine the stack offset of a variable that occurs 
before its declaration has been processed. 

Saraiva et al. present two implementations of such a compiler: a lazy imple- 
mentation and a strict implementation generated from an attribute grammar 
m Both implementations represent the environment as two variables, which 
we refer to as the local environment and the complete environment. The local 
environment contains all of the declarations for the enclosing blocks, but only 
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the variables declared to the left of the current position in the current block. As 
new declarations are encountered, they are recorded in this environment. The 
complete environment contains all of the declarations that should be visible at 
the current point, including those that occur to the right of the current position. 
This environment is used for code generation. In the compilation of a block, the 
environments are connected using a circular reference: the input value of the 
complete environment is the local environment that results from processing the 
block. 

This implementation can be directly translated into our language by using 
a persistent variable to implement the complete environment. If we follow this 
strategy, however, our approach is unable to eliminate the circularity. Because 
the initial value of the local environment is the dynamic value of the persistent 
variable representing the complete environment of the enclosing block, the local 
environment is always dynamic, and cannot be used to initialize the complete 
environment of the nested block to a static value. The strict implementation of 
Saraiva et al. resolves this dependency by a strategy analogous to calling the 
loader for the treatment of a block from the reader, rather than from the loader 
of the context. Within the reader, the complete environment for the surrounding 
context has been determined. It is, however, not clear how to infer the need for 
this implementation strategy using binding-time analysis. 

By slightly reorganizing the structure of the environment, we can implement 
the Block compiler using persistent variables and data specialization. The local 
environment of a nested block extends the enclosing block’s complete environ- 
ment, but does not otherwise depend on its contents. We thus replace the flat 
environment generated by Saraiva et oZ.’s implementation by an environment 
constructed of frames, such that an empty, and thus static, frame is allocated 
on entry to each block. The loader adds the declarations made by the block to 
this frame, which is then stored in a persistent variable at the end of the block. 
When the reader enters the block, it extends the complete environment with 
this new frame, producing an environment suitable for code generation. This 
program structure is also used in the second inlining example. The reorgani- 
zation corresponds roughly to reformulating a computation f{d) into d 0 /(s), 
where s is some static initial value and 0 is some operation, thus permitting the 
computation of / to be considered static. 

A related example is Karczmarczuk’s use of circularity to concisely imple- 
ment complex mathematical operations in a lazy functional language m- Like 
the compiler of the Block language, Karczmarczuk’s implementation has the 
property that a circular value from an enclosing computation is used in perform- 
ing a subcomputation. Here, however, the value from the enclosing computation 
is used eagerly, and it is not clear how to perform a rewriting of the form of the 
conversion of f{d) into d 0 /(s). 
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7 Related Work 

The most closely related work is the automatic efficient implementation of an 
attribute-grammar specification, which has been discussed in Sectional Here, we 
review the history of data specialization and of multiple levels of specialization. 

Data specialization: Automatic data specialization was initially developed by 
Barzdins and Bulyonkov |^, and described and extended by Malmkjaer m- 
These approaches are more complex than ours. In particular, they use memo- 
ization in the construction of the data specialization cache. Knoblock and Ruf 
implement data specialization for a subset of C and investigate its use in an 
interactive graphics application P). Chirokoff et al. compare the benefits of 
program and data specialization, and propose combining these techniques jS|. 
Our implementation of data specialization in Tempo builds on that of Chirokoff. 

Incremental specialization: We have proposed to iterate binding-time analysis 
and data specialization to resolve dependencies among a hierarchy of persistent 
variables. Marlet, Consel, and Boinot similarly iterate the specialization process 
to achieve incremental run-time program specialization HH. Alternatively, Gliick 
and Jprgensen define a binding-time analysis and program specializer that treat 
multiple levels at once p. Their analysis should be applicable to our approach. 

8 Conclusions and Future Work 

In this paper, we have shown how circular programs can be implemented using a 
minor extension of standard partial evaluation techniques. Previously developed 
techniques to generate optimized implementations of circular specifications are 
naturally achieved by the basic strategy of caching the values of static expressions 
that occur in a dynamic context. We have found the use of persistent variables 
crucial in experimenting with a variety of optimization strategies for run-time 
specialization in Tempo. Because the introduced code is localized, and a staged 
program is generated automatically, variants can be implemented robustly and 
rapidly. 

In future work, we plan to allow persistent variables as first-class values. 
Given the set of analyses already performed by Tempo |H| , this extension should 
be straightforward. We also plan to investigate whether the information collected 
by the analysis used for attribute grammars can be useful in the context of partial 
evaluation. We hope that the work presented here will lead to further exchange of 
techniques between the attribute-grammar and partial-evaluation communities. 
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A Data Specialization of Branching Statements 

The data specialization rules for conditionals and while loops are shown in Figure 
0 The principal problem here is to maintain the cache pointer. For static control 
constructs, all possible control paths must set the cache pointer such that a 
single constant offset i can be used after the control construct. The speculative 
evaluation performed for dynamic control constructs implies that cache entries 
are initialized in the loader that correspond to code that is not executed in 
the reader. Thus, the cache itself has to record which cache entries to skip, 
according to the control path chosen in the reader. While speculative evaluation 
is not essential, it has been found useful in practice El- 

The specialization rules in Figure 0 create a cache entry for the value of 
the test of each static conditional and for the value of the test performed on 
each static while loop iteration. A more efficient approach is to collapse nested 
static conditionals into a switch statement and to replace the recording of the 
values of while loop tests by the recording of the number of loop iterations. Both 
optimizations have been implemented in Tempo. 



B Semantics 



The complete semantics of statements and expressions is shown in Figure 0 
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h | ai : (Zi, n, n) i' h ■ (I2,r2,i2) 

TT'l'TfTi^y^T'else'iT^ 

{{.I ; if (w) {Zi ; cache = cache+ii}- else {.I2 ; cache = cache+i2}J, 
if (r) {ri ; cache = cache+iiJ else {r2 ; cache = cache+i2}, 

0) 

i (P : {l,v,r,i') i' + 1 si : (li, ri, ii) ii + 1 1 “^ : (^2, t’2, 12) 

il-| if (e^) Si else §2 '■ 

{{Cache tmp ; I ; tmp = cache+i' ; Zi ; *tmp = cache ; tmp = cache+ii ; I2 ; *tmp = cache}, 
if (r) {ri ; cache = *(cache+ii)} else {cache = *(cache+i^)r2}, 

h) 



i l~d : {I, V, r, i') l~d ^ : {ls,rs,is} 

i hj while (e^) s : {{/; while (u) {Zg ; cache = cache + is — i;Z}}, 
while (r) {rs ; cache = cache + is — i}, 

is) 

i : (Z, V, r, i') i' + 1 s : (Zs, is) 

i hj while (e^) s : 

{{Cache tmp ; Z ; tmp = cache + i' ;ls', *tmp = cache}, 

{Cache tmp; tmp = cache ; while (r) {r^ ; cache = tmp} ; cache = * (cache + i')y, 

is) 



Fig. 5. Data specialization of branching statements 



Statements: 

a \-g e : V a hg ei : £ a hg 62 : v a hg e : v 

CT x = e : cr[x w] a hi *ei = €2 ■ o-[£ 1-^ v] a pwriteCp, e) : cr[p°'^’' u] 



ahte:l a hj Si : a 
(7 if (e) Si else S2 '■ P 



CT h® e : 0 (7 S2 : c' 

a hg if (e) Si else S2 : o' 



o hi e : 1 (7 s : (7^ while (e) s : o" 

o while (e) s : o" 



g hg e : 0 

o hg while (e) s : o 



CT hg Si : ai ... On-i hg s„ 
g hg {si ; . . . ;Sn} . gn 






Expressions: 

g hg c : c g[x w] hg x : g 



g[x undefined] hg s : g'[x g] 
g hg {Cache x;s} : o' 



g hg ei : gi o hg 62 : V2 
o hg ei op 62 : vi op V2 



g[£ g] hg e : t' g 7^ undefined 

o[£ g] hg *6 : g g[p'" g] hg pread(p) : g 



Fig. 6. Semantics of statements and expressions 
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Abstract. In this paper we extend the well-known combination of for- 
ward and backward static analyses in abstract interpretation for the 
verification of complex temporal properties for transition systems. First, 
we show that this combination, whose results are often better than those 
obtained by using both analyses separately, can be used to check simple 
temporal properties with just one Hxpoint. Then we extend this result 
to more complex temporal properties, including a superset of Ctl in the 
case of non-game properties, and a superset of Atl in the case of game 
properties. 



1 Introduction 

Abstract interpretation EQQE] is a formal method for inferring general properties 
of a program. When the program is described as a transition system, two kinds of 
analyses can be done: backward analysis and forward analysis. Forward analysis 
simulates program computations, whereas backward analysis simulates reverse 
computations. Both analyses can be combined to obtain much better results, 
since each analysis may reduce the loss of precision introduced by the other 
However, only restricted kinds of properties expressed in the /Lt-calculus as 
intersections of properties in the form of vX.{v A OA) and p,X.{p V OA) are 
used, including for user-provided assertions Efl . 

More complex temporal properties (such as Ctl and Atl Q) are commonly 
checked in the model checking approach Pj. In that case, abstractions are com- 
monly used to solve state explosion problems. Anyway, model-checking tools use 
either backward or forward analyses but usually do not combine them, since 
one analysis is enough for finite concrete systems, and reversible temporal log- 
ics uni are not used for specifications (forward logics, that use only predecessor 
operators, are used instead(0. 

In this paper, we show that forward analysis can still be combined with back- 
ward analysis in many model-checking temporal specifications. We will study 

^ In fact, the implemented tool |21 can be used to prove the negation of these properties, 
not these properties. 

^ It is shown in m that state-based abstractions are not complete when checking 
reversible temporal specifications. 
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non-game properties (a subset of /i-calculus formulas), and game properties (a 
subset of A formulas Q). This combination can lead to better results, especially 
when using widening and narrowing techniques 0 to deal with infinite abstract 
lattices. 

2 Standard Combination of Backward and Forward 
Analysis 

The combination of backward and forward analyses was originally introduced 
in Cousot’s thesis Pj in order to approximate the intersection of backward and 
forward transition system collecting semantics. It is widely known and used in 
abstract interpretation, since it enables to combine information given by abstract 
backward and forward operators, and thus to reduce the effects of the loss of 
information due to the abstraction. 

In this section we recall the results that justify the correctness of this com- 
bination. 

2.1 Combination of Fixpoints 

Lemma 1. Let l} ,T^ ,Li^) and P**(C**, T**, T®, □**, U^) be complete lat- 
tices with a Galois connection ^ P'^, G P^ ^ P^ and B'° G P^ ^ P^ 

Ct 

he two monotonic functions, and let 

= gfp \z.{z f\z) b\z)) 

If F^ G P^ —>■ P'^ and B^ G P^ P'^ are monotonic and satisfy a ° F'’ ° ^ C# yti 
and a o o 7 then the sequence (X„)neAf defined by Xq = T'^, X2n+i = 

X2n F\X2n), and X2n+2 = X2„+i Fit* (X2n-i-i) , Vn > 0 zs such that: 

Vn > 0, a{L'>) X„+i 

The optimality of this approach has been proved in |^: it has been shown 
that L'^ = gfp XZ.(Z n® F^{Z) n® B^Z)) is the greatest lower bound of the set 
F® defined inductively as: 

_ T# e F# 

— If F is in F^ then so are F®(F) and B^{Z). 

— If Z\ and Z2 are in then so are Z\ F2 and Z\ F2. 

Therefore, F® is the best upper approximation of a{L^) that can be obtained 
using F# and B'^. 

2.2 Standard Backward-Forward Combination 

The standard backward-forward combination ^ derives from an application of 
this lemma to the particular case of backward and forward collecting semantics: 
F^, B^ , F^* and B^ are instantiated as follows: 
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F^ = AF.lgfpiAX.(rrf f\x)) 

= AF.igfp2AA:.(r b\x)) 

F# = AF.igfpiAA:.(r n# p{x)) 

F# = AF.igfp2AA:.(r n# b*{x)) 

where Igfp means either Ifp or gfp, and b^ G ^ fK G P* ^ P* are 
monotonic. When a ° o ^ cti jti ^nd a ° 6 '’ o 'y C# 5 #^ conditions of Lemma 
[Dare satisfied |Z]. 

Now, let P^ = p{S), X a set of statefl and r G p{X x S) a, transition 
relation. As usual, we define pre, pre, post, post as: 

post{X) = {s' I 3s : (s, s') G r A s G X} 
post{X) = {s' I Vs : (s, s') G r s G X} 
pre{X) = {s I 3s' : (s, s') G t A s' G X} 
pre{X) = {s I Vs' : (s, s') G t ^ s' G X} 

Given X,P C X, sets of initial and final states, = XX. {X U post{X)) and 
b^ = XX. {P U pre{X)) (and Igfpi = lgfp 2 = Ifp), we have [J: 

j^h ^ P^(T^) = Ifp f n Ifp ( 1 ) 

By computing 7 (P*) 0 , we obtain a good upper approximation of F^{X) n B^{X) 
(at least equal to n# pt*(Tt*))). 

P''(T'') is the set of states satisfying the /t-calculus formula p,X.{F\J <>X), 
where F is satisfied by T . Therefore the method is used to analyze reachability 
(rather unreachability, since we compute a superset of reachable states, or a 
subset of unreachable states) properties. Equation Q) holds with the /x-calculus 
formula vX.{F A OX) too, and this result allows the analysis of termination 
properties. 

For other formulas (like ^X.(PVDX)), equation (P) does not hold in general: 
a state may satisfy the backward property and be reachable from an initial state 
which does not satisfy the backward property (an example is given in Figure p. 

In the next section, we will prove that under certain conditions we can still 
use the combination for formulas with a single fixpoint. 



3 Extension with a Single Fixpoint 

As we want to deal with /r-calculus formulas in this section, we assume that = 
XX. {X U post{X)) and that Igfpi = Ifp. Computing (or approximating) a{L^) is 
useless if we do not know LV Equation o must hold in order to approximate 
the sets of descendant states of X which satisfy B^{T^). Anyway, if we want to 

® So = r, t'’ = 0, n'’ = n, = u. 

We may over-approximate the greatest fixpoint with a narrowing p. 
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Fig. 1. Example where ^ F^(T^) Here, = AX.(lFUpre(X)) and 

= AX. (I U post{X)). is then the set of states belonging to a trace of states 

which satisfy the backward property. 



check a temporal formula, we just need to know the set of initial states which 
satisfy that is Thus the combination is useful if the equality: 

xns^(T^) = JnL^ (2) 

is satisfiec0. 

Lemma 2. Assuming that = AX. (I U post{X)), if Lf = then 

lr\B\E)=lnL^. 

Proof As = F^{B'’{S)) = Ifp AX.(H^(X) n (/^X))), it is clear that C 
B^{E). Moreover, the first iteration of the least fixpoint is B^{E) n/*'(0), which 
is equal to B^{S) n X. So we have B^{S) n I C L^, which proves the equality. 

With B^ = AF.lgfp 2 AX.(F n &*’(X)), we have the following lemma: 

Lemma 3. // V(X, F) e p (X)^ , F C 6'’(X) F C &^(X n post (F)), then the 
hypothesis of Lemma^ holds. Thus, equation m) holds. 

Proof. We note that the hypothesis implies: 

V(X, F) G p {Xf ,Y C b\X) Y C b\X <1 f{Y)) 



® And this equality is satisfied when equation 0 holds. 
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We want to prove that (S)) . Left inclusion is the consequence of the 

optimality of L^: 

l}’ = gfp \z.{z n f\z) n b\z)) 

c f\f\s) n b\e)) n b\f\s) n f\e)) 
c f\b\e)) 

Thus, to prove the equality, we need to check that F'^{B'^{S)) is a fixpoint of 
XZ.{Zf^F\Z)f^B\Z)), that is, to prove that F\b\E)) C f\f\b\E))) and 
F^{Bi^{E)) C B'^ {F^ {B^ {E))) . The former inequality is true because F^ o F^ = 
F^ . To prove the latter, we define 17 = F^{BX{E)) and 17' = B'^ {F^ {B^ {E))) . 
We distinguish two cases: 

— if lgfp 2 = Ifp, let Xn, n > 0 be the (transfinite) iteration sequence starting 
from 0 for . We will prove that 17 n Xn C 17', for all n > 0. This is true if 
n = 0 , because Xq = 0 . 

If n is a successor ordinal, and if the inequality holds for n — 1, we have 
17 n Xn Q b^{Xn-i)- Using the hypothesis of the lemma, we obtain 17 n 
Xn C b'°{Xn-i n /*’(17 n X„)). By monotony, f'°{[2r]Xn) C /''(17). So, since 
Xn-i C b\E): 

Xn-i n /(17 n Xn) C Xn-i n b'’(E) n /(17) 
c Xn—i n 17 
C 17' 

Hence 17 n Xn is included in 17 n 6'' (17'), which is equal to 17' by definition 
of 17'. 

When n is a limit ordinal (b^ may be not continuous), if 17 n X„' C 17' for 
all n' < n, then 17 n Xn = 17 n |Jn'<n 

By transfinite induction, 17n7f„ C 17' for all n. As the upper bound of (Xn) 
is B^(E), which includes 17, we have 17 C 17'. 

— if lgfp 2 = gfp, let Xn, n > 0 he the (transfinite) iteration sequence starting 

from E for AX.(f2 n b^). The limit of Xn is 17'. = 17 n b^(E) = 17, since 

17 C B^(E) C b'’(E). Moreover, since B^(E) = b^(B'’(E)), 17 C b^(B^(E)), 
so 17 C b'’(B'’(E) n f(f2)). As B^(E) n /^(17) = 17, we have 17 C 6^(17), and 
X 2 = 17. Thus Xn = 17 for all n > 1, and 17' = 17. 

Application: With b^ = XX. (A U (B H pre(X)) U (C n frre(X))). 

If r C b\X), then, Vj/G F: 

— if y G A, then y G b'°(X n post(Y)). 

— if y G B n pre(X), then 3x G X such that (y,x) G r. Therefore, since y GY, 
X G post(Y), and y G B D pre(X U post(Y)). 

Thus y G b^(X n post(Y)). 

— ifyGC npre(X), then \/x G E, (y,x) G t ^ x G X. As y G Y, (y, x) G t ^ 
X G post(Y), so Vx G E, (y,x) G T ^ x G X n post(Y). 

Thus y G C n pre(X U post(Y)), so y G b^(X n post(Y)). 
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Therefore, Y C b^{X) Y C b^{X r\post{Y)). The other side of the equivalence 
is automatic. Thus the hypothesis of Lemma 0 is satisfied, and equation 
holds. 

So we can use the backward-forward combination to enhance the verification 
of properties in the form of: aX.{A V (B A OX) V (C A nX)) with a £ {/i, v}. 
These properties are interesting: they allow to distinguish between different kinds 
of non-determinism ( “controllable” and “uncontrollable” non-determinism) . We 
are not far from game properties, as we will see in section 

Unfortunately, the extension of this technique to the whole /Lt-calculus does 
not work: for example, a formula like piX.{F V OOX) leaves “holes” in traces, 
preventing combination with forward analysis. However, it is possible, from this 
result, to extend it to a temporal logic expressive at least as Ctl. 



4 Extension to a Larger Specification Language 



In this section, we try to apply the backward- for ward combination to the verifi- 
cation of some /:i-calculus formulas. If (/? is a formula, we denote by |</5] the set 
of states (in X) satisfying (f. 

The formulas ip are defined by the grammar: 



ip :•.= p \ \ ipi f\ if2 \ ipiM ip2 \ I I <^X.{ipi V {ip2 A OX) V {ip^ A DX)) 



with cr G {/r, v}. It is worth noting that all these formulas are closed, and the 
defined temporal logic includes Ctl. Obviously, the logic does not change if we 
replace aX.{ipi V {ip 2 A OX) V (v?3 A nX)) with crX.(<^i A {ip 2 V OX) A ((^3 V nX)) 
in the above grammar. 

Our goal is to obtain a “good” upper approximation Q^{a{X)) of a{Xr\ |(/j]), 
using backward- forward technique to enhance fixpoint computations. We assume 
that for all proposition p, we have an upper approximation |pp of a(|p])0 (and 
an upper approximation fyp]® of a(|^p])). 

We need abstractions of pre, post and pre, and we will respectively denote 
them by pre^post^ and pre^ . Moreover we will denote by post* and post*^ the 
functions AX.lfp XY.{X U post(Y)) and AX.lfp XY.{X U** post^(Y)) respectively. 
The following inequalities are assumed to be satisfied for all X C X: 



a o pre{X) C** pre** ° a(X) 
a o pre{X) C** pre** ° a{X) 
a o post(X) C** post** o a{X) 
a o post*{X) C** post*** o a{X) 



which may be a(|p]), if it is computable. 
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To simplify notations we denote by g), with t] S {b, jl}, the expression: 

gfp AZ.(ifp \x.{z f{X)) igfp xx.{z ^(X))) 

Lfgfp(/, g) is the limit of the decreasing chain defined by Zq = T^, ^2n+i = 
Z2n Ifp XX.{X2n f{X)) and ^2„+2 = ^2n+l Igfp XX.{X2n+l g(X)). 

If (/j is a formula, we can now define : f 2 ^p c P# ^ pt* as follows: 

^p{S) = Ipf n# 5 

n» 

i^nip(S) = fre\f2y,{post^{S))) 
f^<xp{S) = pre'^{^ 2 ^{post^{S))) 

^aX.{ipi\/{ip2A<>x)v{ip3/\ox)){^) — -^igfp( AA^. (5' U® post^(A’)), 

XY.{ f 2 ^,{post*^{S)) 

U**f?^2(posP**(S')) n® pre^(Y) 
U**f?^ 3 (posP**( 5 ')) n* pre^Y))) 

It is worth noting that even if we replace post*^{S) by in the last line, 
we still have to compute it as the first iteration that leads to ifgfp- However, 
this replacement may not change the final result of flip and make the compu- 
tation much faster (because the computation of 17 ^ (T^*) can be simplified). The 
following theorem is valid with or without the replacement: 

Theorem 1 . For any formula p, and I C A; 

a(xnM) a(X) n# np{a(X)) 

Proof. The proof is by induction on the structure of p. By monotony of a, it is 
obvious that a(Xn|:^]) fy* 0!(X), so we need to prove that a(Pn|i^]) f 2 p{a{X)). 

If p = p, by monotony of a: 

a{xn Ip]) a{X) n“ a(|p|) a{X) n# IpI# f 2 p{a{X)) 



If p = Pi A P 2 - 



a{X n IpI) = a{X n |:pi| C X C |:p2l) 

a(xn|^i|)n# a(xn[p2l) 
E“ I 2 ^fya(X))n»f 2 ^,(a(J)) 
np{a{X)) 



If = <pi V (/?2, the calculus is quite the same, except that we use the fact that 
a is additive 0 . 
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If (^ = we have X C pre{post{X)) , so: 

a{X n |(/?|) C# a{fre{post(I) n |v?i])) 

C* pre^{a{post{X) n Iv’i])) 

C* pre**(l7^j(a ° post{X))) 

C** pre**(l7^j(posi* o a{X))) 

li ip = <><pi, using the inequality X n pre(|(/?i]) C pre{post(X) n Iv’i]), we can do 
the same calculus. 

If (/? = aX.{ifi V {ifi 2 A OX) V {ips A tux)), let’s define = AX.(|</ji] U |<^ 2 l H 
pre{X) U |(^ 3 ] n pre{X)). Then |(/?] = Igfp XX.h'^{X). 

We will use Lemma Qwith 

= XX.{XUpost{X)) 

/“ = XX.{a(X) U# post^(X)) 

= XX.{post*{X) n dv3i] U 1^32] n p¥e{X) U |:/33] n pre{X))) 

= XX.{a{post* (X) n Iv^i]) 

Li^a{post*(X) n |:/32l) n** fre\x) 

Li^a{post* (X) n pre^{X)) 

It is clear that a ° ° j and a ° /'’ ° 7 E** /** (given the standard 

properties that a is additive and a ° y is a lower closure operator 0). Thus we 

havea(L?gfp(/^&d)E* 

First, we prove that X n .^^lgfp(/^ = X H |:^]. We have: 

^ Igfp ^^-((Ifp >^Y.f{Y)) n h\X)) c M 

Applying Lemma 0 with we obtain X n |(/?] = Xn Xlgfp(/^ b}’), so X n |(/?] = 

XnlgfpAX.((lfpAK/dh"))n/i'’(Ai)) = XnlgfpAX.&dA). Then, applying Lemma0 
with b^, we obtain: X n Igfp AX.^^A) = X n ,b^), proving the equality. 

So we proved: 

a(xnM) = a(xnxfgfp(/,6d) 

E# a(X) n# a(Xfgfp(/,&d) 

E# a{X) n# xfgfp(/#,6#) 

Now we need to check that: 

4fp(/#,fe#) E“ n^{a{X)) 

By induction hypothesis, we see that l7^(a(X)) = Xfgfp(/#, &'**) with &'* satisfying 
bi b'\ which complete the proof. 
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5 Extension to Alternating-Time Temporal Logic 

Many properties on reactive systems are not easily expressible as /x-calculus 
formulas. This is true for game properties, which can be expressed as alternating 
time ^.-calculus (Afi) formulas [Q, or as formulas of weaker game logics Atl and 
Atl*. In [T3|, basic abstract interpretation theory was applied on alternating 
transition systems, with a model-checking approach of abstraction. As we did 
with a subset of /i-calculus, we will try to apply forward-backward techniques to 
Afi. 

5.1 Alternating Transition System, Operators 

An alternating transition system is a tuple (A, Q, A, 7T, tt), with A a set of 
players, Q a set of states, A = {6i : Q ^ 2^ |i€A}a set of transition 
functions, U a set of propositions, and tt : 7T — > 2*^ a function which associates 
each proposition to a set of states. 

When the system is in state q, each player a must choose a set Qa G A ( 9 ), 
and the successor of the state q must lie in assumed that the 

intersection is always a singleton, so the transition function is nonblocking and 
“deterministic”). Thus, if we want an equivalent of the post operator used in the 
non-game case, it would be: 

Post{a) = |J( Pi lj5a(g)) 

q£(T 



As before, we can define Post*{a) = Ifp AA.(cr U Post{X)). 

The equivalent of the pre and pre operators are the controllable and uncon- 
trollable predecessor relations, defined in eg. In general case, with I C A\{0}, 
they are defined as: 

q e CPrei{a) iff 3(ri G 5i(g))ig/.V(ri G P Ti C cr 

q G UPrei{a) iff V(ri G 6i{q))iei3{Ti G P Ti C cr 

q G CPrei{a) means that, when the system is in state q, the team I can force 
the successor state of q to be in cr. If q G UPrei{a), it means in state q, the team 
I cannot force the game outside cr. Of course, if there is only one player, these 
two operators are equivalent to pre and pre. 



5.2 Alternating-Time p-Calculus 

Afj, formulas are generated by the grammar: 

(fi ::= p \ ^p \ X \ (fii A (fi2 \ (fil'd (fi2 \ {{!)) O <fi I W O Pi I {px.ifii) \ {vx.(fi2) 
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Propositions p are in a set 77', variables x are in a set X, and teams 7 are in 

Given an alternating transition system {S,Q,A,II,'k) such that II' C II 
and E = E' , with £ : X ^ 2^ a mapping from the variables to the set of states, 
each formula p defines a set of states computable as follows: 

Ms = ^(p) 
hpls = QVM 

[pi A P2\s = IPiIs n [P2\s 
[Pl V P2\s = {Plh U \P2\S 
[((-f)) O Pilf = CPreiilipils) 

IW O Pile = UPreiilifijs) 

Ipx.pijg = Ifp Xp-lpijeix^p] 
liyx.pijg = gfp \p.{ipi\s[x^p\ 

If p is closed, does not depend on £, and we will write |i^] for \p\s- 
Given an set of initial states T and a closed formula p, we will try to approximate 
X n or Ifp AX.(|i^] n (I U Post{X))), rather than Post*(X) n |i^]. 



5.3 Abstraction of Alternating Transitions Systems 

The application of abstract interpretation to alternating transitions systems is 
already developed in m, in a model-checking point of view. The definitions are 
adapted to our notations as follows: the concrete lattice P*' is here p{Q), so 

= Q, E''=C, _L^ = 0, n'' = n, u'’ = U. P^ is the abstract lattice, and there 
is a Galois connection p{Q) ' ; PK We define now the abstract operators tt®, 

UPre'^j and CPre'^j. 

For each p in 77', let tt^{p) be an element of P'^ such that tt{p) C 7(7t#(p)J 3, 
and 7T**(p) an element of P^ such that Q\n{p) C 7(7r^(|j)). 

For each subset 7 of E, we define the abstract controllable predecessor re- 
lation CPre\ G P# ^ pt* and the abstract uncontrollable predecessor relation 
UPre\ G P® ^ Pl*. These relations must satisfy, Vct C Q\ 

a o CPrei{a) G* CPre\ ° a(a) 
a ° UPrei{a) UPre\ ° a(cr) 

These operators are not those used in m for the abstract model checking 
algorithm. The authors use under approximation of concrete relation to obtain 
a sound abstract model checker. In this paper, we use upper approximation. 



^ If a{n{p)) is computable, we can take 7r**(p) = a(7r(p)) 
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5.4 Combining Forward and Backward Abstractions 

We need an abstract successor operator for forward analysis. This abstract suc- 
cessor relation Post‘d must satisfy: 

a o Post{a) Post‘d ° a{a) 

Again, we define Post*"^ = AA.lfp Ay.(AU® One can easily check that: 

a o Post* (a) Post*'^ ° a(a) 

We consider the closed A/a formulas (p generated by the grammar: 

p ::= p \ \ piV \ Pi /\ P2 \ {{!)) O Pi I Ui O Pi 

I (^x.{p V V/6p(i:)\{0}(V5/ a ((/)) O a;) V \/ rep(s)\m^Pi' A I-f'l O a:)) 

with a C As before, the last term of the grammar can be rewritten 

exchanging V and A without modifying the expressivity of the logic. 

As for the non-game case, we can now define, if (/? is a formula, g P# ^ pt* 
as follows: 

f2p{S) = tt\p) n® S 
Q-,p{S) = 7r**(p) n® S 
n» 

= >0.^1 (S') 12^2 (S) 

%/»0^(^) = CPre\{Q^{Post^{S))) 

^WOv('S') = UPre\{Q^{Post\S))) 

^<yx.(py\j pipi/\{{I))C)x)y\/ ;\lI'\Ox)){S) = 

L«gfp(AA.(SU» Post\x)), 

AF.( Q^{Post*\S)) 

A^\Jii^v>iiPost*\S))n* CPre^iY)) 
U^UU^^niPost*\S)) n» PPre#(y)))) 



Theorem 2. For all formula p generated by the grammar above, and 2 C Q: 

a{lf^lp\) E“ a(X)n# 12^(a(I)) 

Proof. The proof is essentially the same of the non-game case. 

All we need are the equalities X n UCPreidpij) C UCPrei{Post{X n l^aiD) 
with UCPre = UPre or CPre, and the equivalence: 

Y C b'’{X) ^YC b\x n Post{Y)) 

with b^ = XX.{Au[jj{BinCPrei{X))U[Jj,{Ci'fMJPrei'{X))). These properties 
are quite easy to check. 
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6 A Simple Example 



We illustrate the combination with a very short and easy example. We will 
analyze this small non-deterministic program: 

(0) { X = 1 > 

( 1 ) 



( 2 ) 

(3) 

(4) 



while (n>0) do { 

if (random in [0,1] =0) then 
X = X * n; 



else 



(5) 

( 6 ) 



X = X * (n-1) ; 



(7) 

( 8 ) 

> 

(9) 



fi 

n = n - (input in [0,1]); 



Here, x, n are integers, (random in [0,1]) returns a random integer in [0, 1], 
and (input in [0,1]) returns a integer in [0,1] entered by the user (these 
commands behave in the same way in the transition relation). Control point (0) 
is the program entry, we differentiate it from control point (1), which is the while 
loop entry. 

With initial condition x=l at control point (0), we will try to prove that the 
user cannot be sure to have x=0 at control point (9), that is, the initial condition 
satisfies iyx.{{^A) A (B V Ox) A (C V Dx)) with A meaning that x=0 at control 
point (9), C being the set of states at control point (2), and B being the set of 
states at other control points. 

As we use an upper approximation, we take the negation of the proposition, 
that is (knowing that ~^B = C) : fix. {A V (B A OA) V (C A DA)). So we must 
approximate Ifp Ax.(|A] U (|B] n pre{x)) U dC] n pfe{x))). 

We will use interval analysis with the improvement of the results of local 
decreasing iterations mi for assignments in the backward analysis. 

We must abstract post{X), pre{X) and pre(X). Abstract operators may be 
described as systems of semantics equations The program is almost deter- 
ministic, and pre^ is very close to preK The differences appear at control points 
(2) and (7), but we only need to express it at control point (2), with the equation: 



B2 = P3 n P5 



(n being the intersection of abstract environments). 
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The following table gives the results with a single forward analysis (F**(Tl*)), 
a single backward analysis (B®(Tl*)), the intersection of both analyses (F®(T®)n® 
i?**(T**)), and the first iteration of combination 



Lab. (var.) 


F»(T») 


B»(T») 


F«(T«) n» B»(T«) 


B»(F»(T«)) 


(0) x: 


[1] 


[— oo, -boo] 


[1] 


0 


n: 


[— oo, -boo] 


[— oo, -boo] 


[— oo, -boo] 


0 


(1) x: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0] 


n: 


[— oo, -boo] 


[— oo, -boo] 


[— oo, -boo] 


[— oo, -boo] 


(2) x: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0] 


n: 


[1,-boo] 


[— oo, -boo] 


[1,-boo] 


[1,-boo] 


(3) x: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0] 


n: 


[1,-boo] 


[— oo, -boo] 


[1,-boo] 


[1,-boo] 


(4) x: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0] 


n: 


[1,-boo] 


[— oo, -boo] 


[1,-boo] 


[1,-boo] 


(5) x: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0, -boo] 


n: 


[1,-boo] 


[— oo, -boo] 


[1,-boo] 


[1,-boo] 


(6) x: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0] 


n: 


[1,-boo] 


[— oo, -boo] 


[1,-boo] 


[1,-boo] 


(7) x: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0] 


n: 


[1,-boo] 


[— oo, -boo] 


[1,-boo] 


[1,-boo] 


(8) x: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0] 


n: 


[0, -boo] 


[— oo, -boo] 


[0, -boo] 


[0, -boo] 


(9) x: 


[0, -boo] 


[0] 


[0] 


[0] 


n: 


[— oo, -boo] 


[— oo, -boo] 


[— oo, -boo] 


[— oo, -boo] 



The next iteration of the combination will lead to 0 everywhere, which is 
of course the abstract fixpoint T**. So T*’ = 0 (which is not equal to n 

As, for this kind of temporal property, we obtained 

the expected result. 



7 Conclusion 

We have proved that the combination of forward and backward analyses still 
holds to check complex temporal properties. Whereas this combination is useless 
when dealing with finite domains, and not very useful when abstractions are 
done by hand (as in the model-checking approach), we expect it will significantly 
enhance results given by an automatic abstract analyzer of temporal properties. 

Using the results of this article will require to have over-approximations of 
the pre operator (or predecessor operators of game logic), something which has 
not been not much studied until now. We also need a method to compute over- 
approximations of greatest fixpoints since lower narrowing operators give poor 
results. 
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Abstract. In this paper we design abstract domains for numerical power 
analysis. These domains are conceived to discover properties of the fol- 
lowing type: “The integer (or rational) variable A at a given program 
point is the numerical power of c with the exponent having a given 
property tt”, where c and tt are automatically determined. A family of 
domains is presented, two of these suppose that the exponent can be 
any natural or integer value, the others include also the analysis of prop- 
erties of the exponent set. Relevant lattice-theoretic properties of these 
domains are proved such as the absence of infinite ascending chains and 
the structure of their meet-irreducible elements. These domains are ap- 
plied in the analysis of integer powers of imperative programs and in 
the analysis of probabilistic concurrent programming, with probabilistic 
non-deterministic choice. 



Keywords: Abstract interpretation, static program analysis, numerical power 
analysis, probabilistic analysis. 

1 Introduction 

Abstract interpretation Q is a general theory for approximating the semantics 
of programming languages, including static program analysis as a special case. 
The design of a static program analyzer consists in the design of an approxi- 
mate decidable semantics, called the abstract semantics, which is systematically 
derived from the concrete (standard) semantics of the language. This approach 
has several well known advantages with respect to other methods: (1) The anal- 
ysis is fully described and constructively derived by the way the concrete data 
and control flows are approximated; (2) Its correctness with respect to the con- 
crete semantics can be immediately proved formally by construction; (3) New 
and more advanced analyses can be systematically conceived by modifying the 
abstraction methods |2lblb) . The analysis consists here in the solution of a sys- 
tem of fixpoint equations associated with the concrete semantics of the program, 
where each equation is interpreted in an abstract domain returning an approxi- 
mated transformation of the program invariant at every program point. In order 
to make the analysis effective, the iterative solution of the approximated sys- 
tem of equations has to terminate. This may be achieved either statically, by 
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designing suitable abstract domains with no ascending chains, or dynamically 
while fixpoints are computed by using widening/narrowing operations to speed- 
up convergence 0 . In any case, the design of the abstract domain is central in 
the construction of any abstract interpretation, and therefore of any static pro- 
gram analysis algorithm. An abstract domain is a set of mathematical objects 
representing concrete properties of programs, e.g. the properties of its data- 
structures. In the standard adjoint framework of abstract interpretation □ , a 
pair of adjoint functions relates the concrete and abstract domains in such a way 
that the unique best (i.e. most precise) approximated property can always be 
associated with each concrete object by means of an abstraction function. This 
is an ideal situation which provides the designer of the analysis of a number of 
powerful mathematical tools for proving the correctness and in some cases the 
optimality of the analysis. 

The main results. The aim of this paper is to present a family of domains for 
numerical power analysis, namely the power analysis of numerical (integer or 
rational) values in the framework of abstract interpretation. These domains are 
conceived to discover properties of the following type: “The integer (or rational) 
variable A at a given program point is the numerical power of c with the expo- 
nent having a given property tt”, where c and tt are automatically determined: c 
is an integer or rational number and tt is a property of natural or integer num- 
bers. Consider for instance the following program P, also known as the Collatz 
problem M 

while n ^ 1 do 

7T2 : 

n := if even{n) then n/2 

else 3n -I- 1 

endw 

It is immediate that {3fc. n = c^}P{n = 1} is a valid Hoare triple, with c 
being any power of 2, e.g. c = 2; namely the program P terminates with n = 1 
whenever the input n = 2* holds. The numerical power analysis is devoted to 
automatically discover these situations. 

We present a family of domains for numerical power analysis, denoted P(B, E), 
which are parametric in the set of possible exponents. In particular B G {^, Q} 
while E can be a generic abstract domain on N or Z , with specific properties that 
we will define in the following. Clearly if E is an abstract domain of integers then 
B = Q. We distinguish two cases: (1) when E = {X} and X G {N, Z}, the domain 
P(B, E) determines whether a variable has a value of the kind with c G B and 
k is any value in X; (2) when E it self is an abstract domain, then we are able 
to analyze the properties of the exponent set too. All the domains of this family 
are fully specified and proved correct in the standard Cousot’s adjoint frame- 
work of abstract interpretation. Moreover, they share relevant lattice-theoretic 
properties. In particular, while all these domains may have infinite descending 
chains, they all satisfy the ascending chain condition ACC (no infinite ascending 
chains are permitted). This holds when the domain E is it self an ACC domain. 
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This is particularly important in static program analysis as it proves that the 
corresponding analyses are all decidable (no infinite loops are allowed in the 
abstract semantics) and no widening is necessary to achieve termination. The 
structure of these domains is presented by providing a characterization of their 
lattice-theoretic structure in terms of its meet-irreducible elements. The abstract 
interpretation of standard arithmetic operations on numerical powers are proved 
correct and optimal. 

The fact that a program variable belongs to a numerical power of the form 

is a quite rare event in standard programming. This is justified formally 
by the fact that few operations allows a variable to maintain its values into a 
given numerical power. However, as it is often the case for domains approximat- 
ing numerical values, numerical power analysis becomes particularly informative 
when combined with other domains for the analysis of numerical values, like 
interval analysis PJ. This is the case in all the applications where the detection 
of program invariants that involve numerical powers at a given control point 
contributes to program optimization, e.g. to detect the upper/lower bound in 
memory allocation in hardware design specification by high-level programming 
languages (e.g. VHDL specification). An important field of application of nu- 
merical power analysis is the static analysis of probabilistic programming lan- 
guages, where non-deterministic choice is replaced by a randomized choice. In 
this case, if we have a trace of intermediate states Si with probability pi G [0, 1]: 
(so,Po) — > . . . ^ (si,Pi) (si+i,Pi+i) ^ ) then the probability of the ter- 

minal state (if any) is the product Y\iPi- often the case that the resulting 
probability is a numerical power of some fraction. The numerical power analy- 
sis is accurate for approximating rational product, therefore it can be used to 
approximate the probability resulting in a randomized computation. We apply 
this idea to approximate the probability of probabilistic concurrent constraint 
programs. 

Related works. A number of domains for static analysis of numerical values 
have been proposed in the literature. These include constant propagation EH, 
Granger’s arithmetical congruence analysis Q, interval analysis affine rela- 
tions linear restraints 0 , linear congruence relation analysis 0 , and trape- 
zoid congruence analysis m- None of these domains include numerical power 
analysis. Like congruence analysis, numerical power analysis belongs to the fam- 
ily of non-convex domains, namely of those domains whose objects describe a 
non-convex set of numbers. The relation with numerical congruence analysis is 
even stronger, as congruence analysis can be easily generalized to the family 
of abstract domains generated by suitable subgroups of abelian groups. This is 
not the case of numerical power, as numerical power analysis cannot be recon- 
structed as a special case of Granger’s congruence analysis, namely as residue 
class (or congruence class) of a suitable commutative group. This is due for 
P(Z,{N}) to the fact that integers with multiplication does not form a group. 
Similar observations hold for P(Q, {N}). In his PhD thesis 0, Granger’s intro- 
duced the abstract domain of multiplicative congruences over Q. This analysis 
is essentially an analysis of rational powers with integer exponents, namely the 
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domain P(Q, {Z}). We include this domain in our work as special cases of a more 
general pattern P(Q, E) with E being an abstract interpretation of p{Z). 

2 Preliminaries 

2.1 Basic Mathematical Notions 

If S and T are sets, then p{S) denotes the power-set of S', S\T denotes the set- 
difference between S and T, S C T denotes strict inclusion, and for a function / : 
S — > T and X C S, f{X) = {/(x) \ x G X}. By g o f we denote the composition 
of the functions / and g, i.e., g o f = Xx.g{f{x)). The notation {P, <) denotes 
a poset P with ordering relation <, while (P, <, V, A, T, _L) denotes a complete 
lattice P, with ordering <, least upper bound {lub for short) V, greatest lower 
bound {gib for short) A, greatest element (top) T, and least element (bottom) 
_L. If P is a poset with bottom _L then a G P is an atom if a covers _L, i.e., 
_L < a and for all x G P, _L < x < a implies x = a. Dual-atoms are dually 
defined. An element x G P is meet-irreducible ii x = a A b xG {a, b}. The 
set of meet-irreducible elements in P is denoted Mirr{P). A poset is ACC if for 
each ascending chain {xi < X2 < . . . < x„ < . . . } there is fc G N such that 
Vn > 0: x/c = Xk+n, i-e. every ascending chain has finite limit. A DCC poset is 
dually defined. Consider S C P then the downward closure of S is defined as 
|S'= { X € P \ 3y € S . X <p y }. If A is a set then (A)“ denotes the set of its 
upper bounds and |A| the number of its elements. If n G Z than |n| represent 
its modulus. If fc, n G N and 3k' G N . n = k ■ k' then we write k\n. If m G N is 
the least common multiple of two values k,h G N then we write m = lcm{k, h) 
and if it is the greatest common divisor of them we write m = gcd{k, h). 



2.2 Abstract Interpretation 

Abstract domains can be equivalently formulated either in terms of Galois con- 
nections or closure operators |2| • An upper closure operator on a poset P is an op- 
erator p : P P which is monotone, idempotent, and extensive (Vx G P. x <p 
p{x)). The set of these operators is denoted by uco{P). Let (C, <, V, A, T, T) 
be a complete lattice. A basic property of closure operators is that each closure 
is uniquely determined by the set of its fix-points p{C). When C is a complete 
lattice then both (uco(C), C, U, □, Ax.T, Ax.x) and (p(C), <, Vp, A, T, p(T)) are 
complete lattices where VpA = p(VA). A C C is the set of fix-points of an upper 
closure on C iff A is a Moore-family of C, i.e., A = M(A) = {A^ | S C A} — 
where A0 = T G M(A). For any ACC, M(A) is called the Moore-closure 
of A in C, i.e., M(A) is the least (w.r.t. set-inclusion) subset of C which con- 
tains A and is a Moore-family of C. We say that C is meet- generated by its 
meet-irreducible elements iff C = JVt(Mirr(C)). If A and C are posets, and 
a : C-^A and 7 : A-^C are monotone functions such that x < y(a(x)) 
and a(y(x)) < x, then the quadruple (A,a,7, C) is called a Galois connec- 
tion (GC for short) or adjunction between C and A. Note that in GG for any 
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X G C and y G A: a{x) <a y ^ x <c 7(2/) and 7(2/) = V { 2; | a{x) < y } and 
a{x) = A { 7(2/) }• If in addition 007= Xx.x, then (A, a, 7, C) is a Ga- 

lois insertion (GI) of A in C. In this setting, the concrete and abstract domains, 
respectively C and A, are assumed to be complete lattices and are related by 
abstraction and concretization maps forming a GC (A, a, 7,(7) Following a 
standard terminology, A is called abstraction of C, and (7 is a concretization of 
A. If (A, a, 7, (7) is a GI, then each value of the abstract domain A is useful in 
representing (7, because all the elements of A represent distinct members of (7, 
being 7 1-1. Any GG may be lifted to a GI by identifying in an equivalence class 
those values of the abstract domain with the same concretization. This process 
is known as reduction of the abstract domain. The following result relates GI’s 
with closure operators, and so with Moore families. 

Theorem 1 (P3). Let C and A two complete lattices, then {A, a, 7, (7) is a GI 
iff A is isomorphic to a Moore family of (7. 

Let {Affi^i C uco{C)\ Ui^jAi is (isomorphic to) the reduced product (basi- 
cally cartesian product plus reduction) of all the Afs, or, equivalently, it is the 
most abstract domain which is more concrete than every Ai. Let us remark that 
the reduced product can be also characterized as Moore-closure of set-union, i.e. 
rii^jAi = M(Uig/Ai). Let Program denote the set of (syntactically well-formed) 
programs. The concrete standard semantics is a function |-] : Program — > (7, 
where (7 is a concrete semantic domain of denotations, which we assume to be 
a complete lattice. If an abstract interpretation is specified by a GI (A, a, 7, (7) 
and by an abstract semantic function |-]i* : Program — > A, then |-]i* is a 
sound abstract semantics, or (correctly) approximates |-] if, for any program 
P, odT*]) <A I^*]**, or, equivalently, |P] <c 7(1^’]**)- When concrete semantics 
are specified in fix-point form, i.e. |P] = Ifp(Fp) G C for some given concrete 
domain C and semantic operator Fp, then given a corresponding abstract se- 
mantics §** = {A,Fp) and a GI (A, a, 7, (7), §* is called a sound abstraction of 

§ if for all P G Program, a(lfp(Fp)) <yi Ifp(Fp). This soundness condition can 
be more easily verified point-wise by checking whether for all P G Program, 
ao Fp <.4 Fp o a, or, equivalently, a o Fp o 7 <q Fp. If a o Fp o 7 = Fp then 
Fp is optimal [2j. 

3 The Domain of Numerical Powers 

In this section we introduce a class of domains for power analysis of numerical 
variables. Gonsider the following program fragment: 

n := 1; 

while n < 100 do 
n := n * 2 

endw 

This program computes the smallest power of 2 which is greater than 100, namely 
n = 128. In order to automatically discover the invariant of this program, i.e. 
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Inv = {n = 2^ Ak £ NA n £ [1, 128]} we need to design a domain whose objects 
represent numerical powers of integer numbers. Variables have both values in Z 
and Q, therefore the collection of possible values that a variable may assume 
during the execution of a program is an element in p(Z ) and p(Q) . Recall that 
(p(^)) V, Z, 0, n, U) with X G {Z, Q| is a complete Boolean lattice, representing 
the concrete domain of interpretation. We consider standard numerical types: 
natural, integer and rational. We identify the elements that we consider useful for 
the kind of analysis that we are going to make, namely we define the collection 
of sets each one containing natural or integer powers of a particular integer or 
rational value. Let B G {Z, Qj and E G {Ex C p(X) |X G {N, Z}}. Consider a G B 
and V G E, then we define the set of the X-powers of a as {a}^ = {a^\k £ X}. 
In the following if V is a numerical set we will use the notation Xq for X \ {0}. 
The following proposition is immediate. 

Proposition 1. Let a, 6 G B and fc G Zg such that a = then {a}^ = . 

Definition 1. Let X G {N, Z,Q| and a G X. The exponential atom (e-atom) of 
a is the least value 6 G X such that a = b^ for some k £ Nq . The (unique) e-atom 
of a is denoted . The value a is atomic if a = . 

Corollary 1. 

• Let X G E = Ez and a £ Q then {a}^ = . 

• Let X G E and a G B then {a}^ = where k £N is such that a = a^. 

At this point we can introduce the class of domains which include all the X- 
powers of integer or rational values. 

Definition 2. P(B, E) = |0,B} U {{a}^ | a G B A X G E}}, with the condition 
that i/ E = Ez then B = Q. 

Corollary ^ tells us that in P(B, E) we can first work only on the sets of powers 
of atomic elements and then, under particular conditions, extend all the results 
obtained to the whole P(B, E). In particular, when B = Q and E = Ez, we 
can work only with sets of powers of values greater than one and then we can 
naturally extend all the results to the sets of power of all the rational values. 
Hence, in the following, when we deal with atomic elements in Q, we consider 
them always as greater than one. Note also that P(B, E) = {0,B| U {{o}^ | a £ 
B+ A X G E| U {{a}'^ | a G B“ A X G Ej is the coalesced sum of two disjoint 
domains with common top and bottom elements, B and 0 respectively. Namely 
we can see the domain P(B, E) as composed by two disjoint sub-domains, one 
with positive bases and one with negative bases. We can also note that if a G B+ 
and X G E then {j&j | b £ {a}''-} = {|6| | b £ {— a}^} but {a}^ ^ {—a}''-. This 
means that the elements ja}^ and {—a}^ are constituted by the same values 
but the sign. This symmetry is important because it allows us to extend any 
property that we are going to prove on |0,B} U {{a}^ | a G B+ A X G E} to 
the whole domain P(B, E). The following lemma says that, if there exists h £ Xq 
and k £ Yq such that = b^, and both a and b are atomic, then a = b. 
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Lemma 1. 



1. Let a, 6 G B he atomic with a ^ b and X,Y G p(X), X G {N, Z}, then 






({1} if og xnY 

\ 0 otherwise 



2. If a GM is atomic and (E, C) has a top element T then {a}^ is a dual-atom 
of P(B, E) . 

This result allows us to say that all the domains (| {a}^, C), when a is atomic, 
are completely disjoint, i.e. they have no elements in common. Figure E shows 
the domain P(Z, {N}). 



4 Analyzing the Base 

In this section we focus our attention on the analysis of the base of numerical 
powers only. This is achieved by considering that exponents may have any natural 
or integer value. We introduce two basic abstract domains P(B, {X}), where 
X S {N, Z} and we prove their algebraic properties. In particular, we would like 
to find that they form complete lattices and are both an abstraction of p(B) . In 
this way we would be able to analyze the described property for any integer and 
rational variable. First we have to prove that P(B, {X}) is a lattice (we define 
the finite lub and gib operations) and then, by showing a GI between p(B) and 
P(B, {X}), we can conclude, by Theorem Q that P(B, {X}) is a Moore family 
of p(B). Therefore it is a complete lattice and, in particular, an abstraction of 
p(B). 

Lemma 2. Let X G {N, Z} and fc, G X then fcX H h%. = /cm(fc, /i)X. 



Theorem 2. Let a,b gM such that a = b = b^ for some k,h G N. Consider 
Xg {N,Z} 



{a}* n {b}^ = 



im 



if = d 

otherwise 



Proof. Consider = d 



{d'=}^ n {d'“}^ = {d}'=* n {d}“ = {d“ I w G kX} n {d“ | w G hX} = 
= {d“ I w G fcX A w G hX} = {d^ \ w G kxn hX} = 



for Corollary E and Lemma El If then the intersection is {1} for what 

we proved in LemmaQ (point I.) because {a}*n{6}* = {a|}“n{6|}“ = {!}. 

Let B = Z and X = N, then, in practice, for finding the intersection of two 
sets of powers, {a}^ and {b}^, we have to find the e-atoms of the values a and 
b. For example, consider a and suppose that its factorization is Y\1=iPT' then 
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we take k = gcd{mi, . . . ,ms) which is such that a = a^, and if we assume 
Wi < s . Tii = rrii I k then = Y\1=iPT ■ Intuitively this value k is the greatest 
exponent that is common to all the exponents {rrii) in the factorization. In this 
way the value whose prime factorization is obtained by erasing the value k from 
the exponents is lii® e-atom of a because it is the least value, by 

construction, such that a is a power of it. The same we can do for b finding 
that for some G N we have b = b^. Now if we find the intersection as 

{a}^^ n {b}^^ = 

Example 1. Consider a,b G Z, we have to determine c G Z such that {c}^ = 
{a}^ n {6}^ where a = 144 = 2"* • 3^ and b = 1728 = 2® • 3®. Then, using the 
notation of the theorem, we have that k = gcd{2,4) = 2 so = 2^ • 3, namely 
the same prime factors of a with the exponents divided for their gcd. Moreover 
h = gcd{6,3) = 3 so =2^-3, hence the two values share the same e-atom 
and therefore the intersection exists. So we can easily see that a = (2^ • 3)^, 
6 = (2^ • 3)® and lcm{k, h) = lcm{2, 3) = 6. At this point we can conclude that 
c = (2^ • 3)® = 2985984 is the value which identifies the intersection. 



When B = Q, X G {N, Z}, and we want to find the intersection of two sets of 
powers {p}^ and {q}^, we have to calculate the e-atoms of the values p and q. 
First of all if X = Z and p < 1 then we consider {p}^ = {1/p}^, the same we 
can do if g < 1. For this reason we can suppose both p and q greater than one. 
Consider p = a/b, then it is clear that p= foi' some ki,k 2 ^ N. Now 

if TO = gcd{k\,k 2 ), then we can rewrite the equality as p = where 

ki = k[m and k 2 = fc^TO. Hence p = p^ and p| = /b^'^. The same we can 

do for q and then we calculate the intersection as we have done in the example 
before. 



Example 2. Let p = 



544 



3^4 



(2 • 3® ) 

58 



and q = 



54® (2 • 3®)® 



il2 



512 



We can see 



that k = gcd{4,8) = 4 and h = gcd{6,12) = 6, then p| = 



by = 



2 • 3® 



det 

= qi; = r, 

which are the product of the same prime factors of p with the exponents divided 
for their gcd. Now we can find the intersection of the two values, p = r'^ and 
q = r®, as r™ where to = Icto(4, 6) = 12. Hence the intersection is represented 
'2 3 ®' 



As for the least upper bound operation is concerned, we consider the following 
theorem. 



Theorem 3. Let a, & G B such that a = b = b^ for some k,h gN. Consider 
Xg {N,Z} 



{a}^\/{b}^ 



if a^^b^=d 
B otherwise 



Proof. Consider = &| = d. Then surely {a}*, {6}* C because 

both a and b are equal to a power of Consider now {c}* A {a}*, {&}* 
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with c € B. Then we can say that 3k' G X.a = and 3h' G X.6 = d!' = 

for definition of inclusion. Because d is atomic, this equality implies that = d 
by Lemma m so there exists 0 G X such that c = d^. But z must be a divisor 
of k and of h for the equalities written above. Then by definition of gcd, this 
implies that z < gcd{k,h), namely c < Now we have to prove that 

{c}* contains the candidate to be the least upper bound, a way is to prove that 

^^gcd(k,h)yx p, l^jX ^ ^ g ^^gcd(k,h)yX p j^jX ^ |g|X 

for some e G B. This implies that {a}*, {6}* C {e}* and we have just seen 
that this implies that e < For the properties of intersection we have 

also that {e}* C {da‘=d(k,h)yx^ so e > We can conclude that it must be 

e = namely {c}* D {rf 9 cd(fc,/i)|X^ 

As in the intersection we can describe an analogous constructive method for 
systematically derive the representative value of the least upper bound of two 
generic elements of the domain. 

Example 3. Consider a,b € Z, a = 3^^ • 5® and b = 3^® • 5®, we want to find the 
value which represents {a}^V {6}^. As in Example E consider k = gcd{6, 12) = 6 
and h = gcd(8, 16) = 8, then we can calculate the common e-atom of the two 
numbers as 0 | = 3^ • 5 = = d, which is, for example, the value a with the 

exponents of the factorization divided hy k = 6. Then a = d^ and 6 = d® and, 
by using the notation of the theorem, we calculate m = gcd{a>, 8) = 2 and we 
can conclude that the least upper bound is {3'^ • 5^}^. 




Fig. 1. Integer’s power domain P(Z, {N}). 
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5 Abstracting the Exponent Set 

In this section we enhance our domain in order to better characterize the expo- 
nent set of numerical powers. This is achieved by observing that, in DefinitionEl 
E may be any collection of sets of integers or natural numbers. In particular, 
E can be an abstraction of p(N) or p(Z). The resulting domain is more precise 
than the one described above because it allows us to analyze numerical powers 
by combining the domain in Section0for analyzing the bases and other abstract 
domains, like interval analysis, congruence analysis etc., for analyzing the expo- 
nents. As we will see later on, this generalization is not always possible, and 
some restrictions on the structure of E have to be taken into account. Assume 
that En € ■uco(p(N)) and Ez € uco(p(Z)) be respectively abstract domains for 
natural or integer values, and E € {Ex |X € |N, Z}}. The main problem with this 
generalization is that if A S E and k € N (or k G h), then in general kX may 
not belong to E. In order to obtain a Moore family we need that the intersection 
of numerical powers {a}^ and {b}^ has the form of {c}^ with Z G E. The 
following definition introduces the notion of exponential-closed domains. These 
domains ensure that the abstraction of the exponent in a non-trivial domain E 
(i.e. where E ^ {N},{Z}) is a Moore family, and therefore an abstraction of 
p(B). 



Definition 3. Let X G {N, Z}. A Moore family Ex is called exponential-closed 
domain ifWX,Y £ Ex A VA;,h G X . G X, IT G Ex . kX H hY = zW . It is 
called infinitely exponential- closed if this holds also with infinite intersections. 




It is immediate to prove, by Lemma Q that if X € (N, Z} then {X} is an 
exponential-closed domain. However, this condition doesn’t hold in general for 
abstract domains. Consider the abstract domain in Figure 0 We can easily verify 
that 4 • [0, 4] n 2 • [6,9] = 4 • [3,4] which is not a multiple of any element of the 
domain. In the following we assume E G {Ex G uco(p(X)) j X G {N, Z}} be an 
exponential-closed domain. 
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Lemma 3. Let a,b GM be atomic and X,Y S E. 

( if a = b A XnY^iD 

{a}^ n {&}^ = < {1} if 0 G X nY A a ^ b 
[ 0 otherwise 

Theorem 4. Let a,b GM and X, F G E then {a}^ n {b}^ = n 

where fc, /i G N are such that a = and b = b^. 

As for the least upper bound operation is concerned, we consider the following 
Lemma. 

Lemma 4. Let a,b G M be atomic and X,Y G E. Then 

otherwise 

Theorem 5. Let a,b gM and X,Y G E, if a = and b = b^f for some h,k GN 
then {a}^ V {b}^ = V 

6 Basic Domain Properties 

At this point we have that P(B, E), when E is an exponential-closed abstract 
domain, is a lattice ordered by inclusion. In the following we prove that the 
domains built so far don’t have infinite ascending chains when E is ACC, but 
they can have infinite descending chains. Moreover, in the following, we give a 
characterization of the meet-irreducible elements of P(B, E). 

Lemma 5. Let X G {N, Z}, then P(B, {X}) is ACC. 

Proof. For proving that this lattice is ACC it is sufficient to prove that each 
element has a finite number of upper bounds. 

Now we prove that if a G B then ({a}*)“ = {{6}* |a G {&}*}. We can see that: 

{c}*G ({a}*)“ 4A {a}*C{c}* 4A {1, a, a^, . . . } C {c}* 4A 
AA a G {c}* <tA {c}* G {{b}^ I a G {6}*} 

Now it’s clear that, if {c}* G {{b}^ \ a G {&}*} and all the values are integer, then 
there exists /c G X such that a = and so c|a. Because the number of divisors of 
an integer is finite, it happens that |({a}*)“| = |{{6}* | a G {&}*}| < w and so 
we can say that the lattice is ACC. The same can be concluded when the values 
are rational for the same reasoning because it happens that the elements which 
are greater than a fraction a/b reduced in lowest terms are all the fractions c/d, 
reduced in lowest terms, such that c|a and d\b. This because a fb A b fa implies 
that Vfc G N A b^ fa^. 
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The following result relates meet-irreducibility of numerical powers with meet- 
irreducibility of their exponent sets. 

Lemma 6. If a G M is atomic and X £ E then {a}^ is meet-irreducible if and 
only if X is meet-irreducible in E. 



Proposition 2. 



1. IfE is ACC then P(B,E) is ACC. 

2. P(B,E) is not DCC. 

3. Consider a S B, k,h,w G N and X,Y, Z G E 
Mirr(P(B,E)) = | {a}^ 

4. P(B,E) = M(Mirr(P(B,E))). 



((a = A kX = hY fl wZ) => 

{kX = hY V kX = wZ)) V a = 0 



} 



In order to prove that P(B, E), with E being an exponential-closed domain, is an 
abstraction of p(B) we have to find a GI between these two domains. Consider 
a : p(B) — > P(B, E) and 7 : P(B, E) ^ p(B) defined as follows: 



a{Y) = Pi { {a}^ |X e E A F C {a}^ } 

7({a}^) = {«}^ 



The following lemmas prove that a is well defined respectively whenever either 
E S {{N}, {Z}} or E is an infinitely exponential-closed domain. 

Lemma 7. Consider Y G p(B), a € B and X € {N, Z}, then 

|{{a}prc{a}^}|<cu 

Proof. Let y G Y, for Lemma|3 we can say { {a}^ | y G {a}* } = ({j/}*)“ and 
I { I y G {“}* } I < Now we know that Y C {a}^ implies that y G {a}*, 

so we can write { {a}^ |F C {a}^ } — { \ v ^ }> because the 

second of these sets is finite then also the first one must be finite. 



Lemma 8. Consider the families {oilajGB 0‘nd{Xi] then = {a}^^ 

for k GN and X € E. 

Proof. Consider at = for ki G N (if there exist in the family at least two 
values with a different e-atom, then the intersection is empty) 

^ ^ = {ap vf . ^ e hXi} = 

= {a’|\hGf^^ hX,} = {ap h G kX} = {a}^^ 

because we supposed that E was infinitely exponential-closed. 

Therefore, P(B, E) is a Moore family of p(B) if E is an infinitely exponential- 
closed domain or if it is {N} or {Z}. The following theorem follows by Lemma 0 
and El 
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Theorem 6. Let E be an infinitely exponential-closed domain or Eg {{N}, { 2}}- 
Then (P(B, E), a, 7, p(B)) is a Galois insertion. 

Now that we have described the structure of the domain, it can be useful to con- 
sider a programming language with standard arithmetical operations on integers 
or rational numbers, and define their abstract interpretation as follows: 

— Sum 0 : Va, b GM and X,Y gM then {a}^ 0 {6}^ = B 

— Product ©: Va, b G M and X, Y, Z G E then {a}^ © {b}^ = {c}^ where 

= d, with a = df and b — d^, and zW = f]{zW | kX 0 hY C zW}, 
then c = if then {a}^ © {6}^ = B. 

— Division -G: Va, b G Z and X,Y G E^ then {a}^ -G {b}^ = Z 

— Module mod : Va, b G Z and X,Y G E^ then {a}^ mod {b}^ = Z 

— Exponent \/a G Z,b G Z+ and X,Y G E^ then 3({a}'’‘- , {&}^) = {a}^ if 
N G En, 3({a}^, {6}^) = B otherwise 

Now we can prove the correctness of these operations, in particular we can 
prove that a op j = op^ , namely that these abstract operations are the optimal 
abstraction of the corresponding concrete ones on the concrete domain p(B), 
when they are defined. 

Proposition 3. 0, ©, -G, mod , and a are optimal. 

7 Exponential-Closed Domains 

In the following we prove that well-known domains for program analysis of nu- 
merical integer or natural numbers are infinitely exponential closed, and there- 
fore they can be used for approximating the properties of the exponent set. 
We consider Granger’s domain of congruences C'(Z) = {aZ © 6 | a, 6 G Z} P, 
Cousot’s domain of intervals Int(Z) = {[Z, u] | /, u G ZU{— 00, ©oo}, I < u}U{©}, 
and its restriction to N: Int{N) P, and a domain for numerical power analysis 
described in Section P The first domain allows us to analyze numerical powers 
with exponents being of the form mZ © n, with n,m G Z. The second domain 
instead allows us to analyze the size of the exponent set, in N or in Z. Finally, 
the third one analyzes the property of the exponent set of being a power set in 
P(Z,{N}). 

Proposition 4. C'(Z), Jnt(Z), P(Z, {N}) G mco(Z) and Int{N) G uco(N) are 
infinitely exponential-closed domains. 

In the following we consider some examples of intersection and least upper bound 
with different exponent domains. As for the intervals are concerned, note that 
if we consider X G {N, Z} and a,b,c G X then a[b, c] = aX n [ab, ac], namely to 
multiply a value with an interval is equivalent to consider all the multiples of 
the same value inside the interval. 

Example f. Consider B = Z, E = Int{N) and a, 6 G B. We have to determine 
c G B such that {c}^ = {a}I^’^°l fl where a = 144 = 2"^ • 3^ and b — 
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Fig. 3. Examples of intersection in P(Z, 



1728 = 2® • 3^. We have that k = gcd(2,4) = 2 so = 2^ • 3 = 12 Moreover 
h = gcd(6,S) = 3 so = 12, hence the two values have the same e-atom and 
the intersection exists. We can easily see that a = 12^, b = 12^. Now we can find 
the intersection as [ 12 . 20 ] ^ 

can conclude that c = 12® and X = [2,3], namely fl {6}[^d5] _ {c}[2.3]. 

It’s clear that if the exponent sets of the sets of powers represented by their 
e-atoms, are disjoint then, independently from their value, the intersection must 
be empty. Consider {8}[®’i°ln{16}I®’i®l = {2}3Nn[i5.30]n{2}4'^®^[®2-60l = 0 because 
[15, 30] n [32, 60] = 0 (Fig.|3). 



54 543 

Example 5. Consider B = Q and E = C(Z). Let p = — ^ and q = We want 

54 5 

to find {c}^ = {p}^^~^ n {( 7 } 3^ 3-2^ jjj order to have all powers represented by 
values greater than one, note that 



r 54 


r 542 1 


r 542 1 


(542 j 


l 5 M 


\ 54 / 



Now we can see that k = gcd{4, 2) = 2 and h = gcd{6, 3) = 3, then p^ = — = . 

o 

Now we can find the intersection of the two sets as C {p|}33^+® = 

{p^} 60 Z-i -66 because 60Z-I-66 = 4Z-|-2nl5Z-|-6 (remind that fciZ-|-/iinfc 2 Z-|-/i 2 = 
x+lcm{ki, fc 2 )Z and x € kiZ+hink 2 Z+h 2 0). Hence c = 12® and X = lOZ-j-11. 




Fig. 4. Examples of least upper bound in P(Q, Int{Z)). 



Example 6. Consider a,b G M = Q and E = Int{Z). We have to find {c}^ = 
{l/32}[2’io[ V {8}[^®’^3[. First of all we modify the representation of the two sets 
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in order to obtain the notation used in Theorem El We note that {1/32}[^’^°1 = 
{ 32 }[- io ,- 2 ] ^ |2}5Zn[-50,-io]^ similarly {8}[20-25] ^ |2}3zn[60.75]_ r^^ien we see 
that the common e-atom of the two numbers is 2. Then {l/ 32 }[ 2 po] y |g| [ 20 , 25 ] _ 
|2}5zn[-50,-io] V {2}3zn[60,75j ^ |2}[-50,75] because we have that [-50,75] = 
5Z n [-50,-10] V 3Z n [60, 75]. We can conclude that the least upper bound is 

| 21 .[- 50 , 75 ] (Fig.gJ. 



8 Numerical Power Program Analysis 

In this section we apply the numerical power domains to the static analysis of 
programs by abstract interpretation. The following example shows the static 
analysis of numerical powers in P(Z, {N}) of a simple program fragment. 

Example 7. Consider the following program fragment: 



n := 1; m := 4; 

7Ti : 

while n < 10000 do 



7T2 : 



7T3 : 



m := exp(n, m); 
n ~ 2 * n * m 



7T4 : 

endw 



The concrete semantics § : tt ^ {Var p(^)) is a function associating with 
each program point tt a concrete state a € Var p(Z). The following recursive 
equation defines the semantics of program point 7T2, where U is the point- wise 
extension to functions of set union: 



m 1 -^- exp(§7r2(n-),§7r2(w)) 
n 1 -^ 2 • ■ exp(§,„. 2 (n),§,n. 2 (m)) 



The abstract semantics : tt ^ {Var ^ P(Z, {N})) is the least fix-point of the 
equation 



sL 



s“ vs“ 

^TTi ^ ^7T2 



m ^ 3{§i^{n),§i^{m)) 
n ^ {2}^ Q §i^{n) Q 3{§i^{n),§i^{m)) 



The solution is obtained in three steps as follows: 

4,(0) =[^^{4}N,„^{1}N] 

42(§L(0)) =[m^{4}^n-{2r] 

§i,( 42 (§L( 0 ))) = - {2F,n - {2}^] (fix-point) 



(2) 



As for variable m is concerned: 9({2}f*', {4}^*^) = {2}^^ and {4}^^ V {2}^*^ = {2}^^. 
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It is worth noting that most abstract operations, in particular integer division, 
loose precision due to the unknown bound of values in numerical powers. This 
can be overcome by combining P(Z,{N}) with the interval domain Int{Z). We 
consider the reduced product domain P/(Z, {N}) = P(Z, {N}) □ Int{Z). In this 
case, the abstract object ({a}^, [Z, it]) e P/(Z,{N}) represents the concrete set 
of integers {x \ I < x = < u, n G N}. For instance, integer division can be 

improved as follows: 






Otherwise 



a and c'^ = b 



Example 8. Consider the Collatz program in Sec-d and the abstract operations: 
ev : P/(Z,{N}) ^ {tt,fT,T} and ^ : {tt,flf,T} x P/(Z,{N}) x P/(Z,{N}) ^ 
P/(Z,{N}) such that: 



{ tt if a is even 
fF if a is odd 
T otherwise 

( {{a}^,[Z,u]) if a: = tt 

[ {{a}^ V [mm(Z, I'), max{u, w')]) if a: = T 

In this case we obtain the abstract semantics for program point 7T2 as in the 
following equation: 

§5,2 (n) = stj(n)v 

[n (|)(en(St 2 (n)),§t 2 (n) -h ({2}^*, [2, 2]), (Z, [-oo, +oo])) A (Z,[2,+oo])] 

( 3 ) 

It is immediate to prove that if = ({a}^, [I, u]) with a G {2}^ and I > 2, 

then §5^2 (n) = ({2}^, [2, u]) is a fix-point of Eq. 0 

8.1 Static Analysis of Randomized ccp Programs 

Rational power analysis is particularly appropriate for approximating an in- 
variant concerning the probability of randomized programs. In particular the 
abstract objects in P(Q, En) are suitable for approximating the probability com- 
ponent of computations involving randomized choices. This bacause the accumu- 
lated probability is usually obtained as the product of the step-by-step probabil- 
ity of each transition, and the product operation is extremely precise in P(Q, E^), 
with En G {{N}, Int(N)}. Moreover the abstract domain En may characterize 
the property of exponents of the computed probability. The following proposi- 
tion specifies the abstract product operation in the two cases described above. 



Proposition 5. Consider P(Q, En) with En G {{N}, Int(N)}, a, 6 G Q, a| = 
b^ = d, a = d^ , b= d^ for some h,k G N, and g = gcd{k, h). Then 
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R1 (tell(c) , a) — !•! (e , a A c) 



Program Dec . Agent 

Dec ::= e 

I p{x) : — Agent . Dec 

Agent tell(c) 

I 3®. Agent 

I p{y) 

I Agent II Agent 

n 

I □ (ask(ci)lpi ^ AgentJ 



R2 



R3 



R4 



R5 



Cj 

n 

{ □ (ask(ci)lpi ^ Ai), (t) — >p. {Aj , a) 



{A, a) >p (A' , a') 

{A\\B,a) ^p {A'\\B,a') 
(B\\A,a) -^p {B\\A',a'} 



(A , d A 3a.(j) — {B , e) 

(3(a:, d).A , a) — >p {3{x, e).B , a A 3a,e) 

p{x) A G P 

{p{y), 0-) — ^1 (3(a;,d^,j,).A, a) 



Table 1 . The syntax and operational semantics of peep 



1 . //En = {N} then {a}f^0 {6}'^ = 

// % = Int{N) then © {6}[*2.“2] = j^^gyhk'+hh' ,u^k'+u^h'] 

k = k' g and h = h' g. 



In order to model the situation described above as a static program analysis 
problem, we consider a probabilistic version of a concurrent constraint calculus 
peep pig. The syntax and operational semantics of peep has been studied in HZ! 
as in Table 01 The ask-tell paradigm, which is the basis of eep languages, is based 
on the notion of bloeking ask: A process is suspended when the store does not 
entail the ask constraint and it remains suspended until the store entails it. A 
constraint system represents the basic algebraic notion behind cep PI- Infor- 
mally, we have an enumerable set D of elementary assertions and a finite entail- 
ment relation hC pf{D x D). A eonstraint system is S= dx,y) / 

which is a complete w-algebraic lattice, where ~ F iff (Ai)^ = (F)^, being 
(X)^ the entailment closure of a set of assertions X. In order to treat the hiding 
operator of the language, a family of unary operations 3 ^ called cylindrifieations 
is introduced El- Diagonal elements dx,y, i.e. equational constraints between 
variables, are considered as a way to provide parameter passing. The semantics is 
as usual defined by a transition system — >C Conf x Qx Conf , where C — >p C 
indicates that the transition from C to C holds with probability p G [0,1] and 
Conf = Agent x E. Note that in rule R2 an agent Aj is enabled when the store 
entails the ask constraint. The resulting probability pj is normalized according 
to the enabled agents: pj = Pj/(J 2 \-c- Pi)- relational I/O semantics of a peep 
program P is then defined as the following function: 



(DlD.A]^=^Xe. \ 



do = c A Aq = A 

{Ai^di) (^A^j^i ^ di-i^i') 
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The abstract semantics on the GI (P(Q, {N}), a, 7 , p(Q)) is defined as follows: 






do = c A Aq = A 



The following soundness result, which relates concrete and abstract probabilities, 
is immediate by abstract interpretation. 

Theorem 7. If (c,p) G 0|P](d) then there exists (c,p*) G 0*|P](d) sueh that 
P G 7(p“)- 



8.2 Analyzing Randomized Sources 

An important feature of Shannon’s information theory is that the measure of 
information (i.e. entropy) determines the saving in transmission time that is pos- 
sible by proper encoding due to statistics of the message source EOl. In order to 
understand this important result, let us fix a source alphabet § and an encoding 
alphabet £, with a one-to-one function il : § — > £* which returns the encoding 
of each symbol in §. Suppose that each symbol in s G S is provided with a 
corresponding probability Ps G [0, 1]. The average length of the code R(§) is 

ji{A) = j2ps-ms)\ 

sG S 

where | • | indicates the length of a sequence. Usually, Jl(il) > id§, where Hg is 
measure of the information rate (or entropy) of the source: Hg = pi log| g| 1 fpi. 

The encoding of information is optimal when Jl(il) = Hg holds. The question 
is: Can we statically analyze a randomized source in order to determine how to 
optimally encode the information from that source? A basic result in standard 
information theory says that Jl(il) = Hg holds when for any s G §:ps = (1/|£|)"' 
with n G N m- In our setting, where the randomized source is a program with 
probabilistic choice CHI, this analysis corresponds precisely to statically analyze 
the probability of the objects produced by the program (see [llij for a general 
framework for probabilistic program analysis). When this probability lies in a 
rational power of type { 1 /a}^, then any encoding alphabet |£| = a can deter- 
mine the optimal encoding of the source. The result of the analysis is useful in 
finding, effectively, an optimal code for the source given. For example we can use 
Huffman’s algorithm H 21 which takes the source symbols with their probabili- 
ties, the cardinality s of the source and the cardinality t of the code alphabet, 
and returns an optimal code for the symbols of the source. We can observe that 
the Huffman’s algorithm with a fixed encoding alphabet, gives always the op- 
timal code for the particular situation. If we don’t fix the size of the encoding 
alphabet then, with the analysis described above, we can find the best one for 
this alphabet as the value of the entropy of the source when all the probabilities 
are power of the same value. In this way, if we assume t = a where a is the result 
of the static analysis of the source, then this code has the least possible average 
length. 
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Example 9. Consider a randomized counting algorithm P expressed in peep HD 

P : nat{x) : — true\^ — > tell{x = 0) 

□ true\^ — s- 3y{tell{x = s{y))\\nat{y)) ,nat{x) 

The program P generates an infinite sequence of natural numbers 0|P](trrte) = 
{(a: = s"(0),2^iW) I ^ > 0} with decreasing probability. This information can 
be automatically derived by abstract interpretation in P(Q, {N}). The concrete 
domain is p(N x Q). It is immediate to derive the following information by 
abstract interpretation of 0|P] in a product domain T x P(Q, {N}), where T = 
{nat} captures basic type information. In this case, the approximated semantics 
is 0**|P](true) = {(not, {1/2}^)}. 

9 Conclusions 

We have built a family of abstract domains, with a common structure, which are 
useful for analyzing numerical (integer or rational) powers. These domains are 
parametric on the abstraction of the exponent set. Our results so far provides 
the possibility to design new abstract domains for numerical power analysis by 
plogging suitable abstractions of the exponent set. This construction can be 
further generalized by considering the possible algebraic structure of the base 
set B. It is well-known that euclidean rings provide the appropriate algebraic 
structure that allows prime factorization of its elements. This is a key point in 
our construction of P(B, E). However, it is worth noting that N is not a ring, 
even though it allows the prime factorization. Therefore we believe that it is 
possible to find a more abstract algebraic structure for B such that P(B, E) is an 
abstraction of p(B) . This would be particularly important in order to generalize 
the numerical power analysis to the power of other, possibly non-numerical, 
objects (e.g., polynomials). 
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Abstract. This paper proposes a run-time bytecode specialization (BCS) 
technique that analyzes programs and generates specialized programs 
at run-time in an intermediate language. By using an intermediate lan- 
guage for code generation, a back-end system can optimize the specialized 
programs after specialization. As the intermediate langnage, the system 
uses Java virtual machine language (JVML), which allows the system to 
easily achieve practical portability and to use sophisticated just-in-time 
compilers as its back-end. The binding-time analysis algorithm, which is 
based on a type system, covers a non-object-oriented subset of JVML. A 
specializer, which generates programs on a per-instruction basis, can per- 
form method inlining at run-time. The performance measurement showed 
that a non-trivial application program specialized at run-time by BCS 
runs approximately 3-4 times faster than the unspecialized one. Despite 
the large amount of overheads at JIT compilation of specialized code, 
we observed that the overall performance of the application can be im- 
proved. 



1 Introduction 

Given a generic program and the values of some parameters, partial evalua- 
tion techniques generate a specialized program with respect to the values of 
those pa, ra, meters |1 1 |l 7) . Most of those techniques have been studied as source- 
to-source transformation systems; i.e., they analyze programs in a high-level 
language and generate specialized programs in the same language. They have 
been successful in the optimization of various programs, such as interpreters, 
scientific application programs, and graphical application progra,ms [4|l 3l‘jnj . 

Run-time specialization (RTS) techniques j 1 1 )l 1 U 1 iSI22j efficiently perform par- 
tial evaluation at run-time (1) by constructing a specializer (or a generating 
extension) for each source program at compile-time and (2) by directly gener- 
ating native machine code at run-time. The drastically improved specialization 
speed enables programs to be specialized by using values that are computed 
at run-time, which means that RTS provides more specialization opportunities 
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than compile-time specialization. Several studies reported that RTS can improve 
performance of programs for numerical comnutation |1 8I2‘2| . an operating system 
kernel^!], an interpreter of a simple language |Ti^. etc. 

One of the problems of RTS systems is a trade-off between efficiency of 
specialization and efficiency of specialized code. For example, Tempo generates 
specialized programs by merely copying pre-compiled native machine code. The 
performance of the generated code is 20% slower than that is generated by 
compile-time specialization on average |22j. Of course, we could optimize special- 
ized programs at run-time by optimizing generated code after specialization. It 
however makes amortizatior^ more difficult. 

In this paper, we describe an alternative approach called run-time bytecode 
specialization (BCS), which is an automatic bytecode-to-bytecode transformation 
system. The characteristics of our approach are: (1) the system directly analyzes 
program and constructs specializers in a bytecode language; and (2) the special- 
izer generates programs in the bytecode language, which makes it possible to 
apply optimizations after specialization by using just-in-time (JIT) compilation 
techniques. 

As the bytecode language, we choose the Java virtual machine language 
(JVML)[I2|, which provides us practical portability. The system can use ex- 
isting compilers as its front-end, and widely available Java virtual machines, 
which may include sophisticated JIT compilers, as its back-end. The analysis of 
J VML programs is based on a type system derived from the one for J VML |23 . 
A specializer can be basically constructed from the result of the analysis, and 
can perform method inlining at run-time. 

Thus far, we have developed our prototype system for a non-object-oriented 
subset of JVML; the system support only primitive types, arrays, and static 
methods. Although the system does not yet support important language features 
in Java, such as objects and virtual methods, it has sufficient functionality to 
demonstrate fundamental costs in our approach, such as efficiency of specialized 
code and overheads of specialization and JIT compilation. 

The rest of the paper is organized as follows. Section El overviews existing RTS 
techniques and their problems. BCS is described in SectionEl Section^ presents 
the performance measurement of our current implementation. Section Eldiscusses 
related studies. Section El concludes the paper. 

2 Run-Time Specialization 

2.1 Program Specialization 

An offline partial evaluator processes programs in two phases: binding-time anal- 
ysis (BTA) and specialization. BTA takes a program and a list of the binding- 
times of arguments of a method in the program and returns an annotated pro- 

^ In RTS systems, a specialization process of a procedure is amortized if the amount 
of reduced execution time of the procedure becomes larger than the time elapsed for 
the specialization process. 



140 



Hidehiko Masuhara and Akinori Yonezawa 



gram in which every sub-expression is associated with binding-time. The binding- 
time of an expression is static if the value can be computed at specialization time 
or dynamic if the value is to be computed at execution time. For example, when 
BTA receives 



class Power 

{ static int power (int x, int n) 

{ if (n==0) return 1 ; 

else return x*power (x,n-l) ; f } 

with list [dynamic, static] as the binding-times of x and n, it adds static anno- 
tations to the if and return statements, to the expressions n==0 and n-1, and 
to the call to power. The remaining expressions, namely the constant 1 in the 
‘then’ branch, the variable x, and the multiplication, are annotated as dynamic. 

In the specialization phase, the annotated program is executed with the val- 
ues of the static parameters, and a specialized program is returned as a result. 
The execution rules for static expressions are the same as the ordinary ones. The 
rule for the dynamic expressions is to return the expression itself. For example, 
execution of annotated power with argument 3 for static parameter n proceeds 
as follows: it tests “n==0”, then selects the ‘else’ branch, computes n-1, and re- 
cursively executes power with 2 (j. e., the current value of n-1) as an argument. 
It eventually receives the result of the recursive call, which should be “x*x* 1” , 
and finally returns “x*x*x*l” by appending the received result to “x*”. 



2.2 Overview 



Run-time specialization techniques efficiently specialize programs by generating 
specialized programs at machine code level [1 1 III 21 1 t>l I iSI22l2Dj . 

Given a source program and binding-time information, an RTS system effi- 
ciently generates specialized programs at run-time in the following way. It first 
performs BTA on the source program, similar to compile-time specialization 
systems. It then compiles dynamic expressions into fragments of machine code, 
called templates. It also constructs a specializer that has the static expressions 
in the source program and operations for copying corresponding templates into 
memory in place of the dynamic expressions. At run-time, when a specializer 
is executed with a static input, it executes the static expressions, and directly 
generates a specialized program at machine code level. 



2.3 Problems 

Efficiency. There is a trade-off between efficiency of specialization processes 
and efficiency of specialized code in RTS systems. 

A program that is specialized by an RTS system is usually slower than that 
specialized by a compile-time specialization system. This is because RTS systems 
rarely apply optimizations, such as instruction scheduling and register allocation. 
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to the specialized code, for the sake of efficient specialization. Furthermore, pro- 
grams that have a number of method invocations (or function calls) would be 
much slower since method inlining is not performed in several RTS systems. 

For example, Noel, et al. showed that the run-time specialized programs 
have 20% overheads over the compile-time specialized ones on average, in their 
study on Tempo 1^. As we will see in Section 2] the overheads in the run-time 
specialized program can overwhelm the speedup obtained by specialization; i.e., 
the specialized program become slower than the original program. 

If an RTS system performed optimizations at run-time, specialized programs 
would become faster. In fact, there are several systems that optimizes specialized 
code at run-time l2lltil24l . However, the time spent for the optimization processes 
makes amortization more difficult. 

Consequently, an RTS system that can flexibly balance a degree of opti- 
mization of specialized code and time for generating specialized code would be 
beneficial. 



Portability. In order to directly generate machine code, RTS systems often de- 
pend on the target machine architecture. A typical RTS system includes its own 
compiler from source code (usually in a high-level language) to native machine 
code. 

Several techniques have been proposed to overcome the problem. For exam- 
ple, Tempo uses standard C compilers for creating templates |8I22| . 'C, which 
is a language with dynamic code generation mechanisms, generates specialized 
code in retargetable virtual machine languages called vcode and icodeP^. 

3 Run-Time Bytecode Specialization 

3.1 Overview 

Our proposed run-time bytecode specialization (BCS) technique uses a virtual 
machine (bytecode) language as its source and target languages. It takes a byte- 
code program as its input, and constructs a specializer in the same bytecode 
language. At run-time, the specializer, which runs on a virtual machine, gener- 
ates specialized programs in the same bytecode language. As a virtual machine 
language, we choose the Java virtual machine language (JVML)|ilij. 

We aim to solve the problems in the previous section in the following ways: 

Efficiency. Instead of directly generating specialized code in a native machine 
language, BCS generates it in an intermediate (bytecode) language. When 
the system is running on a JVM with a JIT compiler, the specialized code 
is optimized into a native machine language before execution. We also plan 
to control the quality of specialized code and the speed of JIT compilation 
processes by integrating our system with JVMs that have interfaces to its 
JIT compilers, such as Open JIT [23j. 

Another functionality of specializers in BCS is that they can perform method 
inlining at run-time. Although the specialized code with method inlining has 
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□ source 
program 




compile-time 



Fig. 1. Overview of BCS. 



a certain amount of overheads for saving/restoring local variables at bytecode 
level, a JIT compiler can remove most of them according to our experiments. 
Portability. As is shown in a previous run-time code generation system pi24j . 
code generation at the virtual machine level can improve portability. Current 
BCS system generates specialized code in the standard JVML; the gener- 
ated code can be executed on JVMs that are a widely available to various 
platforms. 

The input to the BCS system is a JVML program. This means that the 
system does not depend on the syntax of high-level languages. Instead, run- 
time specialization can be applied to any language for which there exists a 
compiler into JVML. In fact, there are several compilers from various high- 
level languages to JVML[3|Sl etc.], which would be used as a front-end when 
we extended our system to support the fullset of JVML. 

As shown in Figure ^ a compiler first translates a source program written in 
a high-level language {e.g., Java) into JVML bytecode. The compiled program 
is annotated by using our BTA algorithm. From the annotated program, a spe- 
cialize!' for generating the dynamic instructions is constructed. At run-time, the 
specializer takes the values for the static parameters and generates a specialized 
program in bytecode by writing the dynamic instructions in an array. Finally, 
the JVM’s class loader and the JIT compiler translate the bytecode specialized 
program into machine code, which can be executed as a method in the Java 
language. 

In the following subsections, we present the outline of each process in BCS 
briefly. More detailed description can be found in the other literature [21]. 

3.2 Source and Target Language 

As mentioned, our source and target language is JVML, which is a stack-machine 
language with local variables and instructions for manipulating objects. Cur- 
rently, a subset of the JVML instructions is supported. Restrictions are: 

— Only primitive types and array types are supported, {i.e., objects are not 
supported yet). 
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Method int Power .power (int , int) 

0 iload 1 // push n 

1 ifne 4 // go to 4 if n 7^ 0 

2 iconst 1 // (case n = 0)push 1 

3 ireturn // return 1 

4 iload 0 // (case n 7^ 0)push r 

5 iload 0 // push x as arg. #0 

6 iload 1 // push n 

7 iconst 1 // push 1 

8 isub // compute (n — 1) as arg. #1 

9 invokestatic int Power .power (int , int) // call method 

10 imul // compute x x (return value) 

11 ireturn // return 2; x (return value) 



Fig. 2. Method power in JVML. 



— All methods must be class methods {i.e., methods are declared static). 

— Subroutines (jsr and ret), exceptions, and multi-threading are not sup- 
ported. 

Figure |2I shows the result of compiling method power (Section 12. 1 II into 
JVML. A method invocation creates a, frame that holds an operand stack and lo- 
cal variables. An instruction first pops zero or more values off the stack, performs 
computation, and pushes zero or one value onto the stack. 

The iconst n instruction pushes a constant n onto the stack. The isub (or 
imul) instruction pops two values off the stack and pushes the difference (or 
multiple) of them onto the stack. The iload x instruction pushes the current 
value of local variable x onto the stack. The istore x instruction pops a value 
off the stack and assigns it to local variable x. The ifne L instruction pops a 
value off the stack and jumps to address L in the current method if the value 
is not zero. The invokestatic to w(ti, • ■ ■ ,tn) instruction invokes method m 
with the first n values on the stack as arguments. The invokestatic instruction 
(1) pops n values off the stack, (2) saves the current frame and program counter, 
(3) assigns the popped values into variables 0, . . . , (n — 1) in a newly allocated 
frame, and (4) jumps to the first address of method m. The ireturn instruction 
(1) pops a value off the stack, (2) disposes of the current frame and restores the 
saved one, (3) pushes the value on the restored stack, and (4) jumps to the next 
address of the saved program counter. The caller uses the value at the top of the 
stack as a returned value. 



3.3 Binding-Time Analysis 

Strategy Our BTA algorithm is a flow sensitive and monovariant (context 
insensitive) analysis for the subset of JVML based on a type system. From the 
viewpoint of BTA, the subset of JVML is mostly similar to high-level imperative 
languages such as C. Therefore, the analysis should be careful about the following 
respects, unlike the analyses for functional languages: 
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— Since compilers may assign different variables to an operand-stack entry or a 
local variable, the analysis should be flow sensitive HEI; i.e., it should allow 
an operand-stack entry or a local variable to have a different binding-time 
at each program point in a method. 

— As JVML is an unstructured language {i.e., it has a ‘goto’ instruction), merge 
points of a conditional jump and loops are implicit. The algorithm therefore 
has to somehow infer this information. 

The BTA algorithm is based on a type system, following the algorithms used 
for functional languages pill4| . As the type system, we use a modified version of a 
type system of JVML proposed by Stata and Abadi l77PI . The algorithm, which 
is described in the other literature m, consists of the following steps: 

1. In a given program, for each address in each method, it first gives three 
type variables to an operand-stack, to a frame of local variables, and to an 
instruction at the address. By giving different type variables to local variables 
at each address, the system achieves flow sensitivity, as well as the original 
Stata and Abadi’s system. 

2. It then applies typing rules to each instruction of a method, and generates 
constraints among the type variables. 

3. It also generates additional constraints that treat non-local side-effects under 
dynamic control na chapter 11] by using the result of a flow analysis. 

4. It finally computes a minimal set of assignments to type variables that sat- 
isfies all the generated constraints. 



Example Figure 0 shows an example BTA result of power when the binding- 
times of X and n are dynamic and static, respectively!!. The binding-time of 
an instruction, which is displayed in the B column, is either S (static) or D 
(dynamic). The binding-time of a stack, which is displayed in the T column, is 
written as ti • T 2 • • • r„ • e (a stack with n values whose types are ti, T 2 , . . ., from 
the top value) . The binding-time of a frame of local variables, which is displayed 
in the F column, is denoted as 0 (an empty frame) or [i^ i— > r^j (a frame whose 
local variable ik has type r^). Note that the domains of the frame types ‘shrink’ 
along the execution paths. This is because our BTA rules generate constraints 
on only types of live local variables, and the types of unused ones do not appear 
in the result. 

The BTA result is effectively the same as that of the source-level BTA; i.e., 
instructions that correspond to a static or dynamic expression at source-level 
have the static or dynamic types, respectively. 

^ They design their type system for formalizing the JVM’s verification rules in terms 
of subroutines (jsr and ret). Here, our current analysis merely uses the style of their 
formalization, and omits complicated rules for subroutines. 

® The instruction sequence was slightly modified from FigureEl so that any conditional 
jump has merge points within the method. A preprocessor inserts a unique ireturn 
instruction at the end of the method, and replace all ireturn instructions with goto 
instructions to the inserted ireturn instruction. 
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LO : ireturn 
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Fig. 3. BTA result of power. 



3.4 Specializer Construction 

From an original program and a result of BTA, a specializer is constructed in 
“pure” JVML. It generates specialized code on a per-instruction basis at run- 
time jIS|. For each dynamic instruction in the original program, the specializer 
has a sequence of instructions that writes the bytecode of the instruction into an 
array. The specializer also performs method inlining by successively running spe- 
cializers of a method caller and callee, and by inserting a sequence of instructions 
that saves and restores local variables appropriately. 

Here, we describe the construction of a specializer by using pseudo-instruc- 
tions. Note that those pseudo-instructions are used only for explanation, and 
they are replaced with sequences of pure JVML instructions in the actual spe- 
cializer. The specializer is executable as a Java method. 

The extended JVML for defining specializers contains the JVML instructions 
and pseudo-instructions, namely, GEN instruction, LIFT, LABEL L, SAVE n[xQ, . . .], 
RESTORE, and INVOKEGEN m [a:o, . • .], where instruction is a standard JVML in- 
struction. Figure 0 shows an example definition of specializer power_gen with 
pseudo-instructions, constructed from method power. A specializer is constructed 
by translating each annotated instruction as follows. 

— Static instruction i becomes instruction i of the specializer. 

— Dynamic instruction i is translated into pseudo-instruction GEN i. When GEN 
i is executed at specialization time, the binary representation of i is written 
in the last position of an array where specialized code is stored. 

— When an instruction has a different binding-time than that of the value 
pushed or popped by the instruction, pseudo-instruction LIFT is inserted. 
More precisely, (1) when a static instruction at pc pushes a value onto the 
stack and T[pc + 1] = D ■ a, where a denotes an arbitrary stack type, LIFT 
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Method Power ,power_gen(int) 
iload_l 
ifne L2 
LI : iconst_l 
LIFT 
goto LO 
L2:GEN iload_0 
GEN iload_0 



iload_l 

iconst_l 

isub 

INVOKEGEN Power . power _gen(int) [] 
GEN imul 
goto LO 
LO : return 



Fig. 4. Specializer definition with pseudo-instructions. 



is inserted after the instruction. The iconst_l at LI in Figure 0 is an ex- 
ample. (2) When a dynamic instruction at pc pops a value off the stack and 
T\pc] = S -a, LIFT is inserted before the instruction. The execution of a LIFT 
instruction pops value n off the stack and generates instruction “iconst n” 
as an instruction of the specialized program. 

— Static invokestatic to m(<i, . . . , is translated into pseudo-instruction 
INVOKEGEN m_gen(tj-^, . . . ,tj^) [xq,xi, . . where tj^, . . . ,tj^. are the types 
of static arguments, and xo,xi,... are the dynamic local variables at the 
current address. When INVOKEGEN is executed, (1) instructions that save 
local variables xq,Xi, . . . to the stack and move values on top of the stack to 
the local variables are generated, (2) a specializer m_gen is invoked, and (3) 
instructions that restore saved local variables xq,Xi, . . . are generated. The 
number of values moved from the stack to the local variables in (1) is the 
number of dynamic arguments of m. 

— When conditional jump ifne L is dynamic, the specializer has an instruc- 
tion that generates ifne, followed by the instructions for the ‘then’ and ‘else’ 
branches. In other words, it generates specialized instruction sequences of 
both branches, one of which is selected by the dynamic condition^. First, 
the jump instruction is translated into two pseudo-instructions: GEN ifne 
L and SAVE n [xq,xi, . . .], where n and [xo,x\, . . .] are the number of static 
values on the stack that will be popped during the execution of the ‘then’ 
branch and a list of static local variables that may be updated during ex- 
ecution of the ‘then’ branch, respectively. In addition, pseudo-instruction 
sequence LABEL L\ RESTORE is inserted at label L. When SAVE is executed 
at specialization time, the top n values on the current stack and the local 
variables xo,a:i,... are saved. The execution of RESTORE resets the saved 
values on the stack and in the frame. 



Since JVML is an unstructnred language, construction of a generating extension 
whose control flow visits all the nodes in both branches is not trivial. The algorithm 
for constructing such a generating extension will be explained in the other literature. 
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Method int power_2(int) 



5 istore_0 

6 iconst_l 

7 imul 



0 iload_0 

1 iload_0 

2 istore_0 

3 iload_0 

4 iload_0 



8 imul 

9 ireturn 



Fig. 5. Specialized version of power with respect to 2. 



3.5 Specializer Execution 

The specializer definition is further translated into a Java method so that it 
takes (1) several parameters needed for specialization including an array byte [] 
code in which instructions of the specialized program are written and (2) the 
static arguments of the original method. 

When a program uses the specializer, the following operations are performed: 
(CP creation) A ‘Constant Pool’ (CP) object that records lifted values dur- 
ing specialization is created, (specializer execution) The specializer method 
is invoked with static arguments and the other necessary information for spe- 
cialization. (class finalization) From the specialized instructions written in 
a byte array and the CP object, a ClassFile imag^ is created, (class loader 
creation) A ClassLoader object is createcQ. (class loading) Using the 
ClassLoader object, the ClassFile image is loaded into the JVM, which defines 
a new class with the specialized method. (Instance creation) An instance of 
the newly defined class is created. The program finally can call the specialized 
method via a virtual method of the instance. 

Figure 0 shows the instructions for specialized power with 2 as a static ar- 
gument. Some instructions, such as those that load a value immediately after 
storing the value, are unnecessary. Those instructions arise to save/restore local 
variables around inlined methods. 

4 Performance Measurement 

4.1 An Application Program: Mandelbrot Sets Drawer 

As a target of specialization, we took a non-trivial application program that 
interactively displays the Mandelbrot sets. The user of the program can enter the 
definition of a function, and the program displays the image of the Mandelbrot 
set that is defined by using the function. Since the function is given interactively, 

® Despite its name, a ClassFile image in our system is created as a byte array. No 
files are explicitly created for class loading. 

® Since some JVM implementations significantly slowed down when a ClassLoader 
object loads a number of classes in our experiment, we create a class loader for each 
specialized code. Section r4..'-{l shows that the time for creating of a ClassLoader object 
is insignificant among the overall specialization overheads. 
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Fig. 6. Execution times of loops of specialized and unspecialized eval. 



the program defines an interpreter for evaluating mathematical expressions. In 
order to draw an image of the set, the application have to evaluate the function 
more than one million times. This means that run-time specialization of the 
interpreter with respect to a given expression could improve the performance of 
the drawing process. 

In our performance measurements, the method eval and its auxiliary meth- 
ods in the interpreter, which take an expression and a store, and returns the 
value of the expression, are specialized with respect to an expression “z*z+c”. 
Since current BCS implementation does not support objects, we modified the 
method to use arrays for representing expressions and stores. 

We measured execution times of the target methods on two JVMs with dif- 
ferent JIT compilers, namely. Sun “Classic” VM for JDK 1.2.1 with sunwjit 
compiler, and Sun “HotSpot” VM for JDK 1.2.2, in order to examine impacts 
of a JIT strategy on the specialization performance. All programs are executed 
on Sun Enterprise 4000 with 14 UltraSPARCs at 167MHz, 1.2GB memory, and 
SunOS 5.6. Execution times are measured by inserting gethrvtime system calls, 
which is called via a native method. 

4.2 Performance of Specialized Method 

We measured performance of three versions of the eval method on the above- 
mentioned JVMs. The first one is the ‘original’ unspecialized method. The sec- 
ond one is a run-time specialized (‘RTS’) method generated by the BCS system. 
The third one is a compile-time specialized (‘CTS’) method, which is obtained 
by applying Temnoj?^ after translating the original method into a C function. 
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Table 1. Execution times and relative speeds of eval method. 





execution times (psec.) 


relative speed 


VM 


original 
Bo Jo 


RTS 

Br Jr 


CTS 

Be Jc 


Bo/Br 


Bo /Be 


Br/ B e 


Classic 


6.405 2,513 


2.255 1,330 


2.257 1,792 


2.841 


2.838 


0.999 


HotSpot 


2.774 245,156 


0.691 146,795 


0.659 159,688 


4.014 


4.212 


1.049 



Table 2. Breakdown of specialization overheads. 



VM 


Classic 




HotSpot 




process 


time(/isec.) (ratio ) 


time(/rsec.) (ratio ) 


CP creation 


46.38 ( 


1.7%) 


95.91 ( 


3.1%) 


specializer execution 


61.67 ( 


2.3%) 


194.81 ( 


6.2%) 


class finalization 


55.77 ( 


2.1%) 


125.18 ( 


4.0%) 


class loader creation 


16.68 ( 


0.6%) 


22.14 ( 


0.7%) 


class loading 


1,907.33 ( 


71.8%) 


1,518.18 ( 48.5%) 


instance creation 


569.73 ( 


21.4%) 


1,172.96 ( 37.5%) 


total {S) 


2,657.57 (100.0%) 


3,129.19 (100.0%) 



Figure 0 shows the execution times of the method, which are measured by the 
following way. A ClassLoader object in our benchmark program first loads a 
new class that contains the (either specialized or unspecialized) eval method. 
The program then measures execution time of a loop that repeatedly invokes 
the eval method. Note that the measured time does not include specialization 
process, but does include the time of JIT compilation processes because JVMs 
perform JIT compilation during method invocations. As a result, the curves of 
the graph are not linear for small iteration numbers. 

We therefore estimated, for each curve, execution times of the JIT-compiled 
body of the method (hereafter referred to as B) and JIT compilation process 
( J), by using an linear approximation of the curve at large iteration numbers. 

Table^lshows the estimated execution times and relative speed of the body of 
the method. As we see in the Jo, Jr and Jc columns, JIT compilation processes 
took from one millisecond to a few hundred milliseconds, depending on the JIT 
compilers. As we see in the Bq/Br and Br/Bq columns, the run-time special- 
ized code runs 3-4 times faster than the unspecialized one does, and achieves 
almost the same speedup factors as the compile-time specialized code does. 



4.3 Specialization Overheads and Break-Even Points 

Elapsed times for the specialization processes {S) are measured by averaging 
10,000 runs. Table El shows the time for each sub-process, which is explained in 
Section rm As we see, 80-90% time of the specialization process is spent for the 
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Table 3. Break-even points. 



VM 


over JIT compiled code 


over newly loaded code 


Classic 


961 


355 


HotSpot 


71,975 


(less than zero) 



ones inside the JVM, namely, class loading and instance creation. We presume 
that some of overheads could be removed if we integrated our system with a 
JIT compiler so that the specializer directly generates specialized code in an 
intermediate representation in the JIT compiler. 

A break-even point (BEP) is a number of runs of a specialized program needed 
to amortize the specialization cost over the execution time of the unspecialized 
program. In programming systems that perform dynamic optimizations, even 
unspecialized programs have to pay overheads of the optimization, namely JIT 
compilation time. We therefore calculated two BEPs. The first one assumes that 
the unspecialized code is already JIT compiled. In this case, a BEP, which is 
calculated by the formula {Jr -k S)/{Bo — Br), is approximately 1,000-72,000 
runs as shown in Table 0 The second one assumes that the unspecialized code 
is newly loaded, and thus pays the cost of JIT compilation during its execution. 
The BCS specialized code exhibits a small BEP in this case, which is computed 
by the formula {Jr S — Jo) / {Bq — Br). Note that the benchmark application, 
in order to draw an image of a given expression, executes the eval method for 
much larger number of times than the BEPs. This means that BCS actually 
improves the overall execution times of the application. 

4.4 Comparison to a Native Code Run-Time Specialization System 

In order to compare the speedup factors and specialization overheads with a run- 
time native-code specialization system, we also wrote the same interpreter in C, 
and specialized by using Tempo 1.1 94J. We have tested two binding-time con- 
figurations for specializing the interpreter. The one is to specialize the function 
with respect to three out of five arguments (shown in the ‘3/5’ row in Table El) 
which is the same configuration to the experiment in BCS. The other is to spe- 
cialize with respect to two out of five arguments (the ‘2/5’ row), in which an 
array containing a return value index is set to be dynamic. The interpreter is 
compiled by GCC 2.7.2 with -02 option. All the other execution environments 
are the same to the previous ones. 

Tabled shows the execution times and specialization times that are measured 
by averaging ten million runs. We observe that the run-time specialized code is 
slower than the compile-time specialized one in Tempo. Surprisingly, the run- 
time specialized code that is specialized under the same configuration to the 

^ We set both reentrant and post_inlining options of Tempo to true, and the 
compiler options for both templates and specializers to "-02". We also implemented 
an efficient memory allocator for residual code. 
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Table 4. Execution and specialization times and break-even points of eval in 
Tempo. 





execution times (psec.) 


relative speed 


BEP 


^ of static 
args. 


none 

Bo 


RTS 

Br S 


CTS 

Be 


Bo/Br 


Bo ! Be 


BR/Be 




3/5 


1.278 


63.591 85.712 


0.298 


0.02 


4.29 


213.4 


CXD 


2/5 


1.278 


0.789 22.881 


0.363 


1.62 


3.52 


2.174 


46.8 



experiments in the previous subsections is even slower than the original code. 
We presume that this anomaly is caused by a number of array accesses whose 
indices are ‘lifted’ at specialization time. When we made the array to be dynamic 
(the ‘2/5’ row), the run-time specialized function become faster than the original 
one, and its break-even point is smaller than the ones in BCS. 

Comparing between the execution time in BCS and the one in Tempo, we 
notice that compile-time specialized codes in those two systems show the similar 
speedup factors (Bq/Bc)- On the other hand, the speedup factors of the run- 
time specialized code {Bo/Bp^ in Tempo are worse than the one in BCS. This 
can be an evidence of our premise: performing optimizations after specialization 
could be useful to improve performance of run-time specialized code. 



5 Related Work 

Tempo is a compile-time and run-time specialization system for C language |22|. 
Tempo achieves portability by using outputs of standard C compilers to con- 
struct specializers. As the specializers simply copy templates to memory at run- 
time, their BEP numbers are low (3 to 87 runs in their realistic examples). 
On the other hand, the specializers perform no optimizations and no function 
inlining at run-time specialization. 

DyC is another RTS system for C language |I2j. The analysis and specializers 
can directly handle unstructured C programs. The system generates highly opti- 
mized code, by developing its own optimizing compiler for Digital Alpha 21164. 
It can perform optimizations at run-time specialization^. However, the opti- 
mizations seem to make the BEP numbers larger (around 700 to 30,000), similar 
to BCS. 

Fabius is an RTS for pure-functional subset of ML, targeting MIPS R3000PB|- 
Because the source language is a pure functional language, the BTA and special- 
izer construction in Fabius are simpler than those for imperative and unstruc- 
tured languages. Similar to BCS, specializers in Fabius are on a per-instruction 
basis and perform function inlining for tail recursive functions. It is also sug- 
gested that the specializers would perform register allocation at run-time. 

Fujinami proposes a run-time specialization system for C-| — h, targeting MIPS 
R4000 and Intel x86[ID|. The system is designed to perform implicit optimiza- 
tions; i.e., it specializes a given program with respect to its invariants, which are 
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determined by an automatic analysis. A specialized program runs runs twice as 
fast as the one compiled by a statically optimizing compiler. His system achieves 
this speedup by embedding a number of optimization algorithms into a stati- 
cally generated specializer. Our approach, on the other hand, is to optimize a 
specialized code by using a JIT compiler, which is an independent module. 

'C is a language with dynamic code generation mechanisms ^1). Unlike other 
RTS systems, 'C programmers have to explicitly specify binding-times of expres- 
sions. Similar to BCS, the implementation of 'C generates programs in virtual 
machine languages called vcode and icode. The run-time system of icode per- 
forms optimizations including register allocation for generated programs, similar 
to JIT compilers for JVMs. 

Bertelsen proposes, independently of BCS, an algorithm for binding-time 
analysis of a JVML subset, which does not include method invocations nor 
objectsl^. A specialization process based on the analysis is informally discussed, 
which is not yet implemented to the authors’ knowledge. 

JSpec is an off-line, compile-time partial evaluator for Javaptij. The system 
analyzes and specializes Java programs by applying Tempo, a partial evaluator 
for C, after translating the Java programs into C. This approach can be compared 
to ours that uses a compiler from a high-level language to a bytecode language as 
a front-end. Unlike current BCS implementation, JSpec supports objects whose 
specialization strategies are specified through specialization classes m- 

6 Conclusion 

In this paper, we have proposed run-time bytecode specialization (BCS), which 
specializes Java virtual machine language (JVML) programs at run-time. The 
characteristics of this approach are summarized as follows: (1) the system directly 
analyzes a program and creates a specializer in an intermediate language JVML; 
and (2) the specializer generates programs in JVML, which makes it possible to 
apply optimizations after specialization by using existing JVMs with just-in-time 
(JIT) compilers. 

The binding-time analysis algorithm is based on a type system, and also uses 
results of flow analysis to correctly handle stacks, local variables, and side-effects. 

Thus far, we have implemented a prototype BCS system for a JVML subset 
and have shown that a non-trivial program specialized by our system runs ap- 
proximately 3-4 times faster than the unspecialized program. The specialization 
cost can be amortized by 1,000 to 72,000 runs, depending on the JVMs. Those 
numbers are worse than the ones in the systems that are rather focusing on the 
specialization speed 113221, though. 

We are now extending our system to support the full JVML. Since current 
implementation only supports primitive types and arrays, rules that properly 
handle references to objects should be devised. To support objects and arrays, 
the system needs information whether data is modified by other methods or other 
threads. Such information could be obtained by either static analysis {e.g., the 
one studied by Choi, et al.0) or through user declarationsP^- practice, it 
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is also important to support other features, such as multi-threading, and sub- 
routines {i.e., jsr and ret instructions in JVML) and exceptions. Some may 
consider that templates of bytecode would reduce specialization costs. As our 
experiments in Section 0 showed, however, the major sources of specialization 
overheads are class loading and JIT-compilation. Rather than improving the 
performance of the bytecode generation process, our current plan is to generate 
a specialized program directly in an intermediate language of a JIT compiler, by 
using JVMs with interfaces to JIT compilers EHI . 
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Abstract. This paper presents a new numerical abstract domain for 
static analysis by abstract interpretation. This domain allows us to rep- 
resent invariants of the form {x — y < c) and (±a; < c), where x and y 
are variables values and c is an integer or real constant. 

Abstract elements are represented by Difference- Bound Matrices, widely 
used by model-checkers, but we had to design new operators to meet the 
needs of abstract interpretation. The result is a complete lattice of infinite 
height featuring widening, narrowing and common transfer functions. 
We focus on giving an efficient 0{n^) representation and graph-based 
0{n^) algorithms — where n is the number of variables — and claim that 
this domain always performs more precisely than the well-known interval 
domain. 

To illustrate the precision/cost tradeoff of this domain, we have imple- 
mented simple abstract interpreters for toy imperative and parallel lan- 
guages which allowed us to prove some non-trivial algorithms correct. 



1 Introduction 

Abstract interpretation has proved to be a useful tool for eliminating bugs in soft- 
ware because it allows the design of automatic and sound analyzers for real-life 
programming languages. While abstract interpretation is a very general frame- 
work, we will be interested here only in discovering numerical invariants, that is 
to say, arithmetic relations that hold between numerical variables in a program. 
Such invariants are useful for tracking common errors such as division by zero 
and out-of-bound array access. 

In this paper we propose practical algorithms to discover invariants of the 
form {x — y < c) and (±cc < c) — where x and y are numerical program variables 
and c is a numeric constant. Our method works for integers, reals and even 
rationale. 

For the sake of brevity, we will omit proofs of theorems in this paper. The 
complete proof for all theorems can be found in the author’s MS thesis m 



Previous and Related Work. Static analysis has developed approaches to 
automatically find numerical invariants based on numerical abstract domains 
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representing the form of the invariants we want to find. Famous examples are 
the lattice of intervals (described in, for instance, Cousot and Cousot’s ISOP’76 
paper 0|) and the lattice of polyhedra (described in Cousot and Halbwachs’s 
POPL’78 paper 0) which represent respectively invariants of the form (v € 
[ci,C 2 ]) and (aiVi + ••• + < c). Whereas the interval analysis is very 

efficient — linear memory and time cost — but not very precise, the polyhedron 
analysis is much more precise but has a huge memory cost — exponential in the 
number of variables. 

Invariants of the form {x — y < c) and (±a: < c) are widely used by the model- 
checking community. A special representation, called Difference- Bound Matrices 
(DBMs), was introduced, as well as many operators in order to model-check 
timed automata (see Yovine’s ES’98 paper PI and Larsen, Larsson, Pettersson 
and Yi’s RTSS’97 paper P]). Unfortunately, most operators are tied to model- 
checking and are of little interest for static analysis. 

Our Contribution. This paper presents a new abstract numerical domain 
based on the DBM representation, together with a full set of new operators and 
transfer functions adapted to static analysis. 

Sections 2 and 3 present a few well-known results about potential constraint 
sets and introduce briefly the Difference-Bound Matrices. Section 4 presents op- 
erators and transfer functions that are new — except for the intersection operator — 
and adapted to abstract interpretation. In Section 5, we use these operators to 
build lattices, which can be complete under certain conditions. Section 6 shows 
some practical results we obtained with an example implementation and Section 
7 gives some ideas for improvement. 

2 Difference-Bound Matrices 

Let V = {r;i, . . . ,Vn} be a finite set a variables with value in a numerical set I 
(which can be the set Z of integers, the set Q of rationale or the set M of reals). 

We focus, in this paper, on the representation of constraints of the form 
{vj — Vi < c), (vi < c) and v{i> c), where vt,Vj G V and c S I. By choosing one 
variable to be always equal to 0, we can represent the above constraints using 
only potential constraints, that is to say, constraints of the form (vj — Vi < c). 
From now, we will choose V 2 , ■ ■ ■ ,Vn to be program variables, and vi to be 
the constant 0 so that (vi < c) and (vi > c) are rewritten (vi — v\ < c) and 
(vi — Vi < — c). We assume we now work only with potential constraints over the 
set {vi, . . . ,Vn}. 

Difference-Bound Matrices. We extend I to I = lU{-|-oo} by adding the -l-oo 
element. The standard operations <, =, -I-, min and max are extended to I as 
usual (we will not use operations, such as — or *, that may lead to indeterminate 
forms). 

Any set C of potential constraints over V can be represented uniquely by a n x 
n matrix in I — provided we assume, without loss of generality, that there does not 
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exist two potential constraints (vj — Vi < c) in C with the same left member and 
different right members. The matrix m associated with the potential constraint 
set C is called a Difference- Bound Matrix (DBM) and is defined as follows: 

A fc ii{vj-Vi<c)&C, 

\ -boo elsewhere . 



Potential Graphs. A DBM m can be seen as the adjacency matrix of a directed 
graph Q = (V,M, tc) with edges weighted in I. V is the set of nodes, A C is 
the set of edges and ic G A 1 is the weight function. Q is defined by: 

f {vi,Vj) ^ A if rriij = -boo, 

\ (r’i, Vj) G A and w{vi,Vj) = rriij if 'rriij +oo . 

We will denote by (ii, . . . , ik) a finite set of nodes representing a path from 
node Vi^ to node Vi,. in Q. A cycle is a path such that i\ = ik- 



V-Domain and V°-Domain. We call the V -domain of a DBM m and we 
denote by D{m) the set of points in I" that satisfy all potential constraints: 

T>{m) = {(xi, . . . ,x„) G I” I Vi, j, Xj-Xi< rriij} ■ 

Now, remember that the variable v\ has a special semantics: it is always 
equal to 0. Thus, it is not the V-domain which is of interest, but the -domain 
(which is a sort of intersection-projection of the V-domain) denoted by D^(m) 
and defined by: 

V°{m) = {(x 2 , . . . , x„) G 1 ”“^ I (0, X 2 , . . . , x„) G V{m)} . 

We will call V-domain and V^-domain any subset of I" or 1"“^ which is 
respectively the V-domain or the V°-domain of some DBM. Figure 1 shows an 
example DBM together with its corresponding potential graph, constraint set, 
V-domain and V°-domain. 



^ Order. The < order on I induces a point-wise order ^ on the set of DBMs: 

A , , . 

m Vz,j, rriij < “riij ■ 

This order is partial. It is also complete if I has least-upper bounds, i.e, if I is K 
or Z, but not Q. We will denote by = the associated equality relation which is 
simply the matrix equality. 

We have m ^ n =b D^(m) C D^(n) but the converse is not true. In 
particular, we do not have D^fm) = V^{n) =b m = n (see Figure 2 for a 
counter-example) . 
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Fig. 2. Three different DBMs with the same V°-domain as in Figure 1. Remark 
that (a) and (b) are not even comparable with respect to 



3 Closure, Emptiness, Inclusion, and Equality Tests 

We saw in Figure 2 that two different DBMs can represent the same V°-domain. 
In this section, we show that there exists a normal form for any DBM with a 
non-empty V°-domain and present an algorithm to find it. The existence and 
computability of a normal form is very important since it is, as often in abstract 
representations, the key to equality testing used in fixpoint computation. In the 
case of DBMs, it will also allows us to carry an analysis of the precision of the 
operators defined in the next section. 
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Emptiness Testing. We have the following graph-oriented theorem: 

Theorem 1. 

A DBM has an empty -domain if and only if there exists, in its associated 
potential graph, a cycle with a strictly negative total weight. □ 

Checking for cycles with a strictly negative weight is done using the well-known 
Bellman-Ford algorithm which runs in O(n^). This algorithm can be found in 
Cormen, Leiserson and Rivest’s classical algorithmics textbook |3 §25.3]. 



Closure and Normal Form. Let m be a DBM with a non-empty -domain 
and Q its associated potential graph. Since Q has no cycle with a strictly negative 
weight, we can compute its shortest path closure Q*, the adjacency matrix of 
which will be denoted by m* and defined by: 



N-l 

'^ikik+l if * J ■ 

k=l 

The idea of closure relies on the fact that, if (i = ii, Z 2 , • . • , = j) is a path 

from Vi to Vj, then the constraint Vj — vt < 'rriikik+i can be derived from 

m by adding the potential constraints — Vi^. < 1 < k < N — 1. 

This is an implicit potential constraint which does not appear directly in the 
DBM m. When computing the closure, we replace each potential constraint 
Vj — Vi < rriij , i j in m by the tightest implicit constraint we can find, and 

each diagonal element by 0 (which is indeed the smallest value Vi — Vi can reach) . 
In Figure 2 for instance, (c) is the closure of both the (a) and (b) DBMs. 

Theorem 2. 

1. m* = inf^jn | 'D^{n) — I?°(m)}. 

2. D^{m) saturates m* , that is to say: 

Vf, j, such that m*^ < -boo, 3(a:i = 0,X2, . . . ,a;„) G V{m), xj — Xi = tn*y 



* ^ r. 

m% = 0, 



A 



mm 

1<N 



□ 

Theorem 01 states that m* is the smallest DBM — with respect to ^ — that 
represents a given V'^-domain, and thus the closed form is a normal form. Theo- 
rem 02 is a crucial property to prove accuracy of some operators defined in the 
next section. 

Any shortest-path graph algorithm can be used to compute the closure of 
a DBM. We suggest the straightforward Floyd- Warshall, which is described in 
Cormen, Leiserson and Rivest’s textbook 0 §26.2], and has a 0{n^) time cost. 
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Equality and Inclusion Testing. The case where m or n or both have an 
empty V°-domain is easy; in all other cases we use the following theorem — which 
is a consequence of Theorem 01: 

Theorem 3. 

1. If m and n have non-empty V^-domain, V^{m )=P0( n) 

2. If m and n have non-empty -domain, 'D^{m) C 'D^(n) 

□ 

Besides emptiness test and closure, we may need, in order to test equality or 
inclusion, to compare matrices with respect to the point- wise ordering This 
can be done with a 0{n^) time cost. 

Projection. We define the projection TT^y,^{m) of a DBM m with respect to a 
variable Vk to be the interval containing all possible values of G I such that 
there exists a point (x 2 , ... , cc„) in the V°-domain of m with Xk = v: 

7T|„^(m) = {a; G I I ^{x2, ■ ■ ■ ,Xn) G such that x = Xk} ■ 

The following theorem, which is a consequence of the saturation property of the 
closure, gives an algorithmic way to compute the projection: 

Theorem 4. 

If m has a non-empty -domain, then 7T|„j,(m) = [— 

(interval bounds are included only if finite). □ 

4 Operators and Transfer Functions 

In this section, we define some operators and transfer functions to be used in 
abstract semantics. Except for the intersection operator, they are new. The op- 
erators are basically point-wise extensions of the standard operators defined over 
the domain of intervals 0 . 

Most algorithms presented here are either constant time, or point-wise, i.e., 
quadratic time. 




Intersection. Let us define the point-wise intersection DBM m An by: 

{mAn)ij = , Uij) . 

We have the following theorem: 

Theorem 5. 

A n) = n □ 

stating that the intersection is always exact. However, the resulting DBM is 
seldom closed, even if the arguments are closed. 
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Least Upper Bound. The set of V°-domains is not stable by uniorfl so we 
introduce here a union operator which over-approximate its result. We define 
the point-wise least upper bound DBM mVn by: 

/ \ A / \ 

(mVnjij = max{mij,nij) . 

m V n is indeed the least upper bound with respect to the ^ order. The 
following theorem tells us about the effect of this operator on V*^-domains: 

Theorem 6. 

1. V^{m V n) D 'D^{m) U V°{n). 

2. If m and n have non-empty -domains, then 

(m*) V (n*) = inf{o | D^{o) D T>°(m) U T>°(n)} 

and, as a consequence, V (n*)) is the smallest -domain (with 

respect to the C ordering) which contains D^{m) U D^{n). 

3. If m and n are closed, then so is m\/ n. 



□ 

Theorem 01 states that T>°(m V n) is an upper bound in the set of V°-domains 
with respect to the C order. If precision is a concern, we need to find the least 
upper bound in this set. Theorem 02 — which is a consequence of the saturation 
property of the closure — states that we have to close both arguments before 
applying the V operator to get this most precise union over-approximation. If 
one argument has an empty V°-domain, the least upper bound we want is simply 
the other argument. Emptiness tests and closure add a 0{n^) time cost. 



Widening. When computing the semantics of a program, one often encounters 
loops leading to fixpoint computation involving infinite iteration sequences. In 
order to compute in finite time an upper approximation of a fixpoint, widening 
operators were introduced in P. Cousot’s thesis 0 §4.1.2. 0.4]. Widening is a sort 
of union for which every increasing chain is stationary after a finite number of 
iterations. We define the point-wise widening operator V by: 



(mVn)ij = 



r my 

\ +00 



if 5 

elsewhere . 



The following properties prove that V is indeed a widening: 

Theorem 7. 



1. T>°(mVn) D T>°(m) U I?°(n). 

2. Finite chain property.' 

Vm ond V(ni)igN) the chain defined by: 

^ V°-domains are always convex, but the union of two V°-domains may not be convex. 
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Xo 



A 

= m, 

A 

= XiS/Ui, 



is increasing for ^ and ultimately stationary. The limit I is such that I ^ m 
and Vi, I ^ rii. 



□ 

The widening operator has some intriguing interactions with closure. Like the 
least upper bound, the widening operator gives more precise results if its right 
argument is closed, so it is rewarding to change Xi^i = XiVrii into Xi^i = 
XiV{rii*). This is not the case for the first argument: we can have sometimes 
'D^{mVn) ^ 'D^{{m*)Vn). Worse, if we try to force the closure of the first 
argument by changing Xi^i = XiVrii into XiJ^x = (xiVrii)* , the finite chain 
property (Theorem 02) is no longer satisfied, as illustrated in Figure 3. 




Originally Cousot and Cousot defined widening over intervals V by: 



where: 



[a,b] V [c,d\ = [e,f], 




if a < c, 
elsewhere. 




a b> d, 
elsewhere . 



The following theorem proves that the sequence computed by our widening is 
always more precise than with the standard widening over intervals: 
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Theorem 8. 

If we have the following iterating sequence: 



Xq = ra* , 



[yo,zo] = 7T|^.(m), 
[yk+i,Zk+i] = [yk,Zk] V7r|„.(nfe), 



then the sequence {xk)k&i is more precise than the sequence ([y/c, ^fe])/cGN w the 
following sense: 

Vfc, TT\y.{xk) c [yk,Zk] ■ 



□ 



Remark that the technique, described in Cousot and Cousot’s PLILP’92 pa- 
per PI, for improving the precision of the standard widening over intervals V can 
also be applied to our widening V. It allows, for instance, deriving a widening 
that always gives better results than a simple sign analysis (which is not the case 
of V nor V). The resulting widening over DBMs will remain more precise than 
the resulting widening over intervals. 



Narrowing. Narrowing operators were introduced in P. Cousot’s thesis 0 
§4.1.2.0.11] in order to restore, in a finite time, some information that may 
have been lost by widening applications. We define here a point- wise narrowing 
operator A by: 

(mA-n^ ■ = / iimij = +oo, 

( rriij elsewhere . 

The following properties prove that A is indeed a narrowing: 

Theorem 9. 

1. If'D^{n) C 'D^{m), then 'D^{n) C 'D^(mAn) C 

2. Finite decreasing chain property: 

Vm and for any chain (nj)igN decreasing for the chain defined by: 

( ^ 

\ xo = m, 

is decreasing and ultimately stationary. 



□ 

Given a sequence {nk)kGN such that the chain {'D°{nk))km is decreasing 
for the C partial order (but not {nk)keN for the ^ partial order), one way 
to ensure the best accuracy as well as the finiteness of the chain is 

to force the closure of the right argument by changing Xi^i = XiArii into 
Xi^i = XiA{rii*). Unlike widening, forcing all elements in the chain to be 
closed with Xi^i = (xiArii)* poses no problem. 
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Forget. Given a DBM m and a variable Vk, the forget operator computes a 
DBM where all informations about Vk are lost. It is the opposite of the projection 
operator 7 T|„^. We define this operator by: 

{ min(my , rriik + mkj) \i i ^ k and j yf k, 

0 ifi = j = k, 

+00 elsewhere . 

The V*^-domain of is obtained by projecting 'D'^{m) on the subspace 

orthogonal to ItJfe, and then extruding the result in the direction of v^'. 

Theorem 10. 

= 

{{X 2 , . . . ,x„) G 1”“^ I 3x G I, (X 2 , . . . ,Xfc_i,x,Xfc+i, . . . ,x„) G T>°(m)}. 

□ 



Guard. Given an arithmetic equality or inequality g over {v 2 , - ■ ■ ,v„} — which 
we call a guard — and a DBM m, the guard transfer function tries to find a new 
DBM m,(g) the V°-domain of which is {s G \ s satisfies g}. Since this is, 

in general, impossible, we will only try to have: 

Theorem 11. 

V^{m(^g-^) D {s G T>°{m) \ s satisfies g}. □ 

Here is an example definition: 

Definition 12. 



1. If g = (Vjg - Vig < c) with io yf jo, then: 






, c) if i = io and j = jo, 



m, 



elsewhere . 



The cases g = (vj„ < c) and g = {—Vi„ < c) are settled by choosing respec- 
tively io = 1 and jo = 1 • 

2. If g = {Vjg - Vig = c) with io yf jo, then: 



m 



A 






The case g = (vjg = c) is a special case where io = 1. 

3. In all other cases, we simply choose: 

m,(g) = m . 



□ 



In all but the last — general — cases, the guard transfer function is exact. 
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Assignment. An assignment Vk <— e(v 2 , ■ • ■ , Vn) is defined by a variable Vk and 
an arithmetic expression e over {v 2 , ■ ■ ■ ,Vn}- 

Given a DBM m representing all possible values that can take the variables 
set {v2 , . . . , Vn} at a program point, we look for a DBM, denoted by 
representing the possibles values of the same variables set after the assignment 
Vk ^ e. This is not possible in the general case, so the assignment transfer 
function will only try to find an upper approximation of this set: 

Theorem 13. 

{(X 2 ,... ,Xfc_i,e(x 2 ,... ,X„),Xfc+i, . . . ,Xn) \ (x 2 ,... , x„) G T>°(m)} . 

For instance, we can use the following definition for 
Definition 14. 

1. If e = Vig + c, then: 

^ ( m^j -c ifi = ioj ^ jo, 

~ \ C if i ^ ^0; J ~ Jo? 

[ rriij elsewhere . 



2. If e = Vjg + c with to yf jo, then we use the forget operator and the guard 
transfer function: 

The case e = c is a special case where we choose jo = 

3. In all other cases, we use a standard interval arithmetic to find an interval 
\—e~, e+J, e+, e“ G I such that 

[-e“,e+] D e(7r„2(m), . . . ,7r^„(m)) 

and then we define: 

{ e+ if i = 1 and j = io, 

e~ if j = ^ and i = io, 

elsewhere . 



□ 



In all but the last — general — cases, the assignment transfer function is exact. 
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Comparison with the Abstract Domain of Intervals. Most of the time, the 
precision of numerical abstract domains can only be compared experimentally 
on example programs (see Section 6 for such an example). However, we claim 
that the DBM domain always performs better than the domain of intervals. 

To legitimate this assertion, we compare informally the effect of all abstract 
operations in the DBM and in the interval domains. Thanks to Theorems 0 and 
02, and Definitions and ^1 the intersection and union abstract operators 
and the guard and assignment transfer functions are more precise than their 
interval counterpart. Thanks to Theorem 0 approximate fixpoint computation 
with our widening V is always more accurate than with the standard widening 
over intervals V and one could prove easily that each iteration with our narrowing 
is more precise than with the standard narrowing over intervals. This means that 
any abstract semantics based on the operators and transfer functions we defined 
is always more precise than the corresponding interval-based abstract semantics. 



5 Lattice Structures 

In this section, we design two lattice structures: one on the set of DBMs and one 
on the set of closed DBMs. The first one is useful to analyze fixpoint transfer 
between abstract and concrete semantics and the second one allows us to design 
a meaning function — or even a Galois Connection — linking the set of abstract 
V*^-domains to the concrete lattice V{{v 2 , - ■ ■ ,u„} i— > I), following the abstract 
interpretation framework described in Cousot and Cousot’s POPL’79 paper 0. 



DBM Lattice. The set M. of DBMs, together with the order relation ^ and the 
point- wise least upper bound V and greatest lower bound A, is almost a lattice. 
It only needs a least element T, so we extend V and AtoAdj, = AtU{T}in 
an obvious way to get G, U and □. The greatest element T is the DBM with all 
its coefficients equal to -l-oo. 

Theorem 15. 

1. G, n, U, T, T) is a lattice. 

2. This lattice is complete «/(!,<) is complete (Z or M, but not Q/ 

□ 

There are, however, two problems with this lattice. First, we cannot easily 
assimilate this lattice to a sub-lattice of V{{v 2 , ■ ■ ■ ,Vn} I) as two different 
DBMs can have the same V°-domain. Then, the least upper bound operator U 
is not the most precise upper approximation of the union of two V°-domains 
because we do not force the arguments to be closed. 
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Closed DBM Lattice. To overcome these difficulties, we build another lattice 
based on closed DBMs. First, consider the set of closed DBMs M.* with a 
least element _L* added. Now, we define a greatest element T*, a partial order 
relation C*, a least upper bound U* and a greatest lower bound □* in by: 



T* 


A 


f 0 iii=j, 

\ -boo elsewhere . 


m C* 


71 


A 


J either 
1 or 


m = IP , 

ra ^ IP ,n ^ IP and m ^ n . 






= 1 


( 771 


S 

II 

1 — 


mU* 


71 


\n 


if m = T*, 






1 


[ m V n 


elsewhere . 


mn* 


71 




A* 

^ (m A n)* 


if m = T* or n = T* or 27° (m A n) 
elsewhere . 



Thanks to Theorem |2|1, every non-empty V'^-domain has a unique represen- 
tation in A4*; T* is the representation for the empty set. We build a meaning 
function 7 which is an extension of to 

, . A J 0 if m = T*, 

7(^j |T)0(m) elsewhere . 

Theorem 16. 

1. (Al^, C*, n*, U*, T*, T*) is a lattice and ^ is one-to-one. 

If (I) is complete, this lattice is complete and 7 is meet-preserving: 

7 (n* A) = n{7(^) I ^ € H}- ^6 can — according to Cousot and Cousot JSI 
Prop. 7] — build a canonical Galois Insertion.' 

7^({u 2,... ,n„}^I) t=^ Ml 

where the abstraction function a is defined by: 
a{D) = p* { m € Ml \ D C 7(771) }. 



□ 

The Ml lattice features a nice meaning function and a precise union approx- 
imation; thus, it is tempting to force all our operators and transfer functions to 
live in Ml by forcing closure on their result. However, we saw this does not work 
for widening, so fixpoint computation must be performed in the M± lattice. 

6 Results 

The algorithms on DBMs presented here have been implemented in OCamI and 
used to perform forward analysis on toy — yet Turing-equivalent — imperative and 
parallel languages with only numerical variables and no procedure. 
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We present here neither the concrete and abstract semantics, nor the actual 
forward analysis algorithm used for our analyzers. They follow exactly the ab- 
stract interpretation scheme described in Cousot and Cousot’s POPL’79 paper 
0 and Bourdoncle’s FMPA’93 paper P and are detailed in the author’s MS the- 
sis M- Theorems □ EIOEIIIII and uni prove that all the operators and transfer 
functions we defined are indeed abstractions on the domain of DBMs of the usual 
operators and transfer functions on the concrete domain V{{v 2 , • ■ • , Vn} I), 
which, as shown by Cousot and Cousot P, is sufficient to prove soundness for 
analyses. 

Imperative Programs. Our toy forward analyzer for imperative language fol- 
lows almost exactly the analyzer described in Cousot and Halbwachs’s POPL’78 
paper |B|, except that the abstract domain of polyhedra has been replaced by 
our DBM-based domain. We tested our analyzer on the well-known Bubble Sort 
and Heap Sort algorithms and managed to prove automatically that they do 
not produce out-of-bound error while accessing array elements. Although we did 
not find as many invariants as Cousot and Halbwachs for these two examples, it 
was sufficient to prove the correctness. We do not detail these common examples 
here for the sake of brevity. 

Parallel Programs. Our toy analyzer for parallel language allows analyzing a 
fixed set of processes running concurrently and communicating through global 
variables. We use the well-known nondeterministic interleaving method in order 
to analyze all possible control flows. In this context, we managed to prove au- 
tomatically that the Bakery algorithm, introduced in 1974 by Lamport 0, for 
synchronizing two parallel processes never lets the two processes be at the same 
time in their critical sections. We now detail this example. 

The Bakery Algorithm. After the initialization of two global shared variables 
t/1 and t/2, two processes pi and p2 are spawned. They synchronize through the 
variables yl and j/2, representing the priority of pi and p2, so that only one 
process at a time can enter its critical section (Figure 4). 

Our analyzer for parallel processes is fed with the initialization code {yl = 0; 
2/2 = 0) and the control flow graphs for pi and p2 (Figure 5). Each control graph 
is a set of control point nodes and some edges labeled with either an action 
performed when the edge is taken (the assignment yl ^ y2 + 1, for example) or 
a guard imposing a condition for taking the edge (the test yl ^ 0, for example). 

The analyzer then computes the nondeterministic interleaving of pi and p2 
which is the product control flow graph. Then, it computes iteratively the ab- 
stract invariants holding at each product control point. It outputs the invariants 
shown in Figure 6. 

The state (2, c) is never reached, which means that pi and p2 cannot be 
at the same time in their critical section. This proves the correctness of the 
Bakery algorithm. Remark that our analyzer also discovered some non-obvious 
invariants, such as j/1 = p2 -|- 1 holding in the (l,c) state. 
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yl = Q-y2 = 0; 

(Pl) 

while true do 

yl = y2 + 1; 

while j/2 7 ^ 0 and yl > j/2 do done; 

critical section 

2/1 = 0 ; 

done 

(p2) 

while true do 

2/2 = 2/1 + 1 ; 

while j/1 7 ^ 0 and j/2 > yl do done; 

critical section 

2/2 = 0 ; 

done 

Fig. 4. Pseudo-code for the Bakery algorithm. 







)?’ 




yl ^ 2/2 + 1 


1 

y2^yl + l 




y2 / 0 and yl > y2 


yl / 0 and J/2 > yl 


2/1^0 


y2 = 0 or yl < y2 


j/1 = 0 or J/2 < yl 

\ " 




V 2 ) ■■ critical section 


\ c 1 :■ critical section 




(pl) 


(P2) 


Fig. 5. Control flow graphs of processes pl and p2 in the Bakery algorithm. 
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(0,a) 


(0,6) 


(0,c) 


yl = 0 


2/1 = 0 


2/1 = 0 


y2 = 0 


2/2 >1 


2/2 > 1 


(l,a) 


(1.6) 


(l,c) 


> 1 


2/1 > 1 


2/1 >2 


to 

II 

o 


2/2 >1 


2/2 > 1 

2/1 - 2/2 = 1 


(2, a) 


(2,6) 


(2,c) 


> 1 


2/1 > 1 




to 

II 

o 


2/2 > 1 

j/i-y2e [-1,0] 


_L 


Fig. 6. Result of our analyzer on the nondeterministic interleaving product graph 


of pi and p2 in the Bakery algorithm. 





7 Extensions and Futnre Work 

Precision Improvement. In our analysis, we only find a coarse set of the 
invariants held in a program since finding all invariants of the form {x — y < c) 
and (±a; < c) for all programs is non-computable. Possible losses of precision 
have three causes: non-exact union, widening in loops and non-exact assignment 
and guard transfer functions. 

We made crude approximations in the last — general — case of Definitions El 
and Hand there is room for improving assignment and guard transfer functions, 
even though exactness is impossible. When the DBM lattices are complete, there 
exists most precise transfer functions such that Theorems El and El hold, how- 
ever these functions may be difficult to compute. 



Finite Union of V°-Domains. One can imagine to represent finite unions of 
V*^-domains, using a finite set of DBMs instead of a single one as abstract state. 
This allows an exact union operator but it may lead to memory and time cost 
explosion as abstract states contain more and more DBMs, so one may need 
from time to time to replace a set of DBMs by their union approximation. 

The model-checker community has also developed specific structures to rep- 
resent finite unions of V-domains, that are less costly than sets. Clock- Difference 
Diagrams (introduced in 1999 by Larsen, Weise, Yi and Pearson [I Ij i and Dif- 
ference Decision Diagrams (introduced in Mpller, Lichtenberg, Andersen and 
Hulgaard’s CSL’99 paper E!) tree-based structures made compact thanks 
to the sharing of isomorphic sub-trees; however existence of normal forms for 
such structures is only a conjecture at the time of writing and only local or 
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path reduction algorithms exist. One can imagine adapting such structures to 
abstract interpretation the way we adapted DBM in this paper. 



Space and Time Cost Improvement. Space is often a big concern in abstract 
interpretation. The DBM representation we proposed in this paper has a fixed 
0{n'^) memory cost — where n is the number of variables in the program. In the 
actual implementation, we decided to use the graph representation — or hollow 
matrix — which stores only edges with a finite weight and observed a great space 
gain as most DBMs we use have many -boo. Most algorithms are also faster 
on hollow matrices and we chose to use the more complex, but more efficient, 
Johnson shortest-path closure algorithm — described in Cormen, Leiserson and 
Rivest’s textbook |21 §26.3] — instead of the Floyd- Warshall algorithm. 

Larsen, Larsson, Pettersson and Yi’s RTSS’97 paper uni presents a minimal 
form algorithm which finds a DBM with the fewest finite edges representing a 
given V*^-domain. This minimal form could be useful for memory-efficient storing, 
but cannot be used for direct computation with algorithms requiring closed 
DBMS. 



Representation Improvement. The invariants we manipulate are, in term of 
precision and complexity, between interval and polyhedron analysis. It is inter- 
esting to look for domains allowing the representation of more forms of invariants 
than DBMs in order to increase the granularity of numerical domains. We are 
currently working on an improvement of DBMs that allows us to represent, with 
a small time and space complexity overhead, invariants of the form (±x±y < c). 

8 Conclusion 

We presented in this paper a new numerical abstract domain inspired from the 
well-known domain of intervals and the Difference-Bound Matrices. This domain 
allows us to manipulate invariants of the form (x — y < c), {x < c) and {x > c) 
with a O(n^) worst case memory cost per abstract state and O(n^) worst case 
time cost per abstract operation (where n is the number of variables in the 
program) . 

Our approach made it possible for us to prove the correctness of some non- 
trivial algorithms beyond the scope of interval analysis, for a much smaller cost 
than polyhedron analysis. We also proved that this analysis always gives better 
results than interval analysis, for a slightly greater cost. 
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Abstract. Object-oriented programming facilitates the development of 
generic software, but at a significant cost in terms of performance. We 
apply partial evaluation to object-oriented programs, to automatically 
map generic software into specific implementations. In this paper we give 
a concise, formal description of a simple partial evaluator for a minimal 
object-oriented language, and give directions for extending this partial 
evaluator to handle realistic programs. 



1 Introduction 

The object-oriented style of programming naturally leads to the development of 
generic program components. Encapsulation of data and code into objects en- 
hances code resilience to program modifications and increases the opportunities 
for direct code reuse. Message passing between objects lets program components 
communicate without relying on a specific implementation; this decoupling en- 
ables dynamic modification of the program structure in order to react to chang- 
ing conditions. Genericity implemented using these language features is however 
achieved at the expense of efficiency. Encapsulation isolates individual program 
parts and increases the cost of data access. Message passing is implemented us- 
ing virtual dispatching, which obscures control flow, thus blocking traditional 
optimizations at both the hardware and software level. 

Partial evaluation is an automated technique for mapping generic programs 
into specific implementations dedicated to a specific purpose. Partial evaluation 
has been investigated extensively for functional m. logical HH and imper- 
ative j2l3lf) languages, and has recently been investigated for object-oriented 
languages by Schultz et ah, in the context of a prototype partial evaluator 
for Java m- However, no precise specification of partial evaluation for object- 
oriented languages has thus far been given. 

In this paper, we give a concise description of the effect of partial evaluation 
on an object-oriented program, and formalize how an object-oriented program 
can be specialized using an off-line partial evaluator. The formalization is done 

* Based on work done in the Compose Group at IRISA/INRIA, Rennes, France; sup- 
ported in part by Bull. 
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on a minimal object-oriented language without side-effects, and the partial eval- 
uator that we define has monovariant binding times and no partially static data. 
Nevertheless, we argue that these partial evaluation principles can be extended 
to specialize realistic programs written in Java. Indeed, these principles form 
the basis of a complete partial evaluator for Java, briefly described in Section |3 
and described in detail elsewhere ims]. We consider class-based object-oriented 
languages; partial evaluation for object-based object-oriented languages is future 
work. 

Overview: First, Section Ogives a concise description of the effect of partial eval- 
uation on an object-oriented program. Then, Section 0 defines a small object- 
oriented language based on Java. The four following sections define a partial 
evaluator for this language: Section El defines a two-level syntax. Section 0 gives 
well-annotatedness rules. Section El gives specialization rules, and Section 0 gives 
a constraint system for deriving well-annotated programs. Afterwards, Section 0 
provides examples of how this partial evaluator can specialize small object- 
oriented programs, and Section 0 summarizes the features needed to scale up 
the partial evaluator to specialize realistic Java programs. Last, Section El in- 
vestigates related work, and Section [H concludes and discusses future work. 

Terminology: In object-oriented programming, the word “specialize” usually 
means “to subclass,” and the word “static” usually indicates a class method 
(i.e., a method that does not have a self parameter). We here use the word spe- 
cialize in a different sense, to mean the optimization of a program or a program 
part based on knowledge about the evaluation context. Also, we always use the 
word static to indicate known information. 



2 Specializing Object-Oriented Programs 

In this section, we first describe the basic principles for specializing object- 
oriented programs, and then give an example. 



2.1 Basic Principles 

We first explain how to specialize a program by specializing its methods, then 
explain how to generate a specialized method, and last explain how to reintegrate 
specialized methods into the program. 

Globally, the execution of an object-oriented program can be seen as a se- 
quence of interactions between the objects that constitute the program. Parts 
of this interaction may become fixed when particular program input parameters 
are fixed. Given fixed program input parameters, partial evaluation can special- 
ize the program by simplifying the object interaction as much as possible. The 
static (known) interactions can be evaluated, leaving behind only the dynamic 
(unknown) interactions. 
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Objects interact by using virtual dispatches to invoke methods. We can spe- 
cialize the interaction that takes place between a collection of objects by special- 
izing their methods for any static arguments. Each specialized method is added 
to the object where the corresponding generic method is defined. Using this ap- 
proach, the specialized object interaction is expressed in terms of the specialized 
methods: a specialized method interacts with some object by calling specialized 
methods on this object. 

A method is specialized to a set of static values by propagating these values 
throughout the body, and using them to reduce field lookups, method invoca- 
tions, and non-object computations. A lookup of a static value stored in a field 
of a static object yields a value that can be used to specialize other parts of the 
program. A virtual dispatch is akin to a conditional that tests the type of the re- 
ceiver object and subsequently calls the appropriate receiver method. When the 
receiver object is static, the virtual dispatch can be eliminated, and the body 
of the method unfolded into the caller. When the receiver object is dynamic 
but is passed static arguments, the virtual dispatch can be specialized specu- 
latively; each potential receiver method is specialized for the static arguments, 
and a virtual dispatch to the specialized methods is residualized. Object-oriented 
languages often include features from functional or imperative languages; such 
features can be specialized according to the known partial evaluation principles 
for these languages. 

The result of specializing a program is a collection of specialized methods to 
be introduced into the classes of the program. However, introducing the special- 
ized methods directly into the classes of the program is problematic: encapsula- 
tion invariants may be broken by specialized methods where safety checks have 
been specialized away, and this mix of generic and specialized code obfuscates the 
appearance of the program and complicates maintenance. A representation of 
the specialized program is needed that preserves encapsulation and modularity. 

We observe that the dependencies between the specialized methods follow 
the control flow of the program, which cuts across the class structure of the 
program. This observation brings aspect-oriented programming to mind; aspect- 
oriented programming allows logical units that cut across the program structure 
to be separated from other parts of the program and encapsulated into an as- 
pect m- The methods generated by a given specialization of an object-oriented 
program can be encapsulated into a separate aspect, and only woven into the 
program during compilation. Access modifiers can be used to ensure that special- 
ized methods only can be called from specialized methods encapsulated in the 
same aspect, and hence always are called from a safe context. Furthermore, the 
specialized code is cleanly separated from the generic code, and can be plugged 
and unplugged by selecting whether to include the aspect in the program. 

In this paper, we represent specialized programs using an aspect syntax based 
on the Aspect J language j2D| . In this syntax, a specialized program is a named 
aspect which holds a number of introduction blocks. Each introduction block lists 
a set of methods to introduce into the class named by the block header. Note 
that to permit a standard compiler to be used, a weaver will usually produce a 
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class Power { 

int exp; Binary op; int neutral; 
Power ( int exp, Binary op. 



class Binary { 

int e( int x, int y ) { 



return this.e(x,y); 

} 

} 



int neutral ) { 
super 0 ; 

this. exp = exp; this. op = op; 



this. neutral = neutral; 

} 

int raise ( int base ) { 



class Add extends Binary { 
int e( int x, int y ) { 



return loop (base , this . exp) ; 

} 

int loop( int base, int x ) { 



return x+y; 

} 

} 



return x==0 



? this. neutral 



class Mul extends Binary { 
int e( int x, int y ) { 



: this. op. e( base. 



return x+y; 

} 

} 



this.loopC base, x-1 ) ); 

} 

} 



Fig. 1. A power function and binary operators. 



standard, object-oriented program. Also, if whole-program specialization is used, 
the set of specialized methods will be self-contained, and the aspect syntax would 
thus be redundant. 

2.2 Example: Power 

As an example of how partial evaluation for object-oriented languages specializes 
a program, we use the collection of Java classes shown in Figure 1. These classes 
implement an object-oriented version of the power function, parameterized by 
the exponent, the binary operator to apply, and the neutral value. The power 
function is computed by the method raise of the class Power; this method uses 
the recursive method loop to repeatedly apply a binary operator to a value. 
The binary operator functionality is delegated to a Binary object, following the 
Strategy design pattern unj. The class Binary is the common superclass of the 
two binary operators Add and Mulfl 

We can specialize the method raise of the class Power in a number of ways; the 
results are illustrated in Figure 2. First, assume that the exponent field is known; 
propagating the value stored in the exponent field throughout the program al- 
lows the recursion of the method raise to be unfolded. The result is shown in the 
aspect Exp_Known. Next, we can specialize the program according to a different 
context where the operator and neutral values also are known; the virtual dis- 
patch to the binary operator can be resolved and unfolded, and the neutral value 

^ Rather than making Binary an abstract class, we for simplicity use a class with 
diverging methods. 
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aspect Exp_Known { 
introduction Power { 
int raise_3(int base) { 

return this . op . e (base , this . op . e (base , this . op . e (base , this . neutral) ) ) ; 

} 

} 



aspect Exp_Op_Neutral_Known { 
introduction Power { 

int raise_3_Mul_l (int base) { return base* (base* (base*l) ) ; } 

} 

} 

aspect Base_Known { 
introduction Power { 

int raise_2() { return this . loop_2 (this . exp) ; } 
int loop_2(int x) { 

return x==0 ? this. neutral : this . op.e_2(this .loop(x-l) ) ; 

} 

} 

introduction Binary { int e_2( int y ) { return this.e_2( y ); } } 
introduction Add { int e_2( int y ) { return 2+y; } } 
introduction Mul { int e_2( int y ) { return 2*y; } } 

} 

Fig. 2. Various specializations of the power example. 



directly residualized. The result is shown in the aspect Exp_OpJJeutral_Known. As 
a last example, we can specialize the program based only on the information 
that the base value is known; speculative specialization allows each e method to 
be specialized for the known base value, as shown in the aspect Base_Known. 



3 Extended Featherweight Java 

To define partial evaluation for an object-oriented language, we use a small 
class-based object-oriented language based on Java HU named Extended Feath- 
erweight Java (EFJ) after Featherweight Java P3j. EFJ is intended to constitute 
a least common denominator for class-based languages so that any partial evalu- 
ation principles developed for EFJ will apply to most other class-based languages 
as well. EFJ is a subset of Java without side-effects, and an EFJ program be- 
haves like the syntactically equivalent Java program. EFJ incorporates classes 
and inheritance in a statically typed setting, object fields, virtual methods with 
formal parameters, and object constructors. 
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P G Program ::= ({CLi , . . . ,CLn} ,e) 

CL G Class ::= class C extends D {Ti fi;...;Tn K 

T G Type ::= int | boolean | C 

K G Constructor ::= C(Ti f i , . . . f„) 

{super(f 1 , . . . ,fi) ; this.fi+i = fi+i ; . . . ;this.f„ = f„;} 
M G Method ::= T m(Ti xi,...,Tn x„) {return e;} 
e G Expression c | x | eo-f | eo .m(ei , . . . ,e„) 

I new C(ei,...,en) | (C)eo | eo OP ei | (eo?ei:e 2 ) 

OP G Operator ::= + | - | + | / | < | > | == | M | I I 

c G Constant ::= true | false | 0 | 1 | -1 | . . . 

C, D G Class-name, f G Field-name, m G Method-name, x G Variable 
Values that result from computation: 

V G Value ::= c | object^ (vi , . . . , Vn) 

Fig. 3. EFJ syntax and values 



Like Java, EFJ is a statically-typed object-oriented language. We will not 
define the EF J typing rules here; we refer to the original presentation of Feath- 
erweight Java H3 or the author’s PhD dissertation for a description of the 
EFJ typing rules. Only the subtyping relation between classes is directly used 
in our formalization; subtyping follows the class hierarchy, and is denoted 



3.1 EFJ Syntax 

The syntax of EFJ is given in Figure 3. A program is a collection of classes and a 
main expression. Each class in the program extends some superclass, and declares 
a number of fields, a constructor, and a number of methods. A constructor always 
calls the constructor of the superclass first and then initializes each field declared 
in the class afterward; the constructor is the only place where fields can be 
assigned values, because there are no side-effects in the language. The definition 
of a constructor is fixed given the fields of a class and its superclass, and the 
semantics of object initialization is not defined in terms of the constructor but 
is defined directly in terms of the fields of the class. However, writing out the 
constructor allows us to retain a Java-compatible syntax. The body of a method 
is a single expression. An expression can be a constant, a variable, a field lookup, 
a virtual method invocation, an object instantiation, a class cast, an operator 
application, or a conditional. 

A value computed by the program can be either a constant or an object; an 
object is represented as a tuple of values labeled with the name of the class of 
the object. 

The special class Object can neither be declared nor instantiated but is part 
of every program. This class extends no other class, and has no methods and 
no fields; with the exception of this class, all classes referenced in the program 
must also be defined in the program. Furthermore, there should be no cycles in 
the inheritance relation between classes. 
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cr h c > c (R-Const) <t h x > c'(x) 


(R-Var) 


Vi e 1. . . n (T h Gi > Vi 

a h new C(ei, . . . ,e„) — > objectc(vi, . . . ,v„) 


(R-New) 


CT h e i obJectc(vi, . . . ,v„) fields{C) = Ti fi, . . . ,T„ f„ 

(T h e.fi — > Vi 


(R-Field) 


(The > obJect(;;(vi, . . . ,v„) Vi£l...n (T h di > dj 

mbody{C, m) = ( (xi , . . . ,Xfe ) , eo ) 

[xi dj , . . . , Xfe dj,,this objectc(vi, . . . ,v„)] h eo — > v 


(R-Invk) 


a h e.m(di, . . . ,d(,) — > v 


(The > obJectQ(vi, . . . ,v„) C <: D 

(T h (D)e — > objectc(vi, . . . ,v„) 


(R-Cast) 


fj h eo — > true (T h eo — > false 

(T h ei > V fj h G2 > v 

( 7 ^ (R-Cond-T) I/O N 

(7 h (eo ; ei: e2) ^ v <7 h (eo :ei: e2 j — ^ v 


(R-Cond-F) 


(T h eo > Vo (T h ei > Vl Agp(vo, Vl) = v' 

(T h eo OP ei > v' 

Environment cr ; Var^Value 
fields(C) = fields of class C 

mbody{C, m) = body of method m defined in class C 


(R-Op) 


Fig. 4. EFJ computation (see appendix for auxiliary definitions) 



3.2 EFJ Evaluation 

We define EFJ computation using the eager big-step semantics shown in Figure 4. 
The evaluation rules have the form cr h e — > v, where e is an expression that is 
reduced into a value v in an environment a that maps variables to values. The 
evaluation rules are defined as follows. A new expression creates an object holding 
the value of each expression passed to the constructor (r-New). A reference to a 
field retrieves the corresponding value (r-Field). Method invocation first reduces 
the self expression to decide the class of the receiver object, which determines 
what method is called; the method body is evaluated in an environment that 
binds the self object to the special variable this and binds each formal parameter 
to the corresponding argument (r-Invk). Class casts can only be reduced when 
the class of the concrete object is a sub-class of the casted type (r-Cast). The 
evaluation rules for the other constructs are straightforward, and will not be 
discussed. To compute the value of a complete program, the main expression of 
the program must be evaluated in an environment that defines the values of any 
free variables in the main expression. 
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2P G 2Program 


:= ({2CLi, . . . ,2CL„},2e) 


2CL e 2Class 


:= class C extends D {Ti f „ ; 2K 2Mi...2Mfe} 


2K G 2Constructor 


:= K 1 K 


K G Constructor 


:=C(Ti fi,...,T„ f„) 

{superCfi , . . . ,fj) ; this.fi = fi ; . . . ;this.f„ = f„} 


2M G 2Method 


:= T m(2Di , . . . ,2D„) {return 2e;} 
1 T m(2Di 2D„.) {return 2e;| 


2D G 2Declaration 


:= T X I T_jc 


2e G 2Expression 


:= eo I lift(2eo) | x I 2eo^ | 2en .m(2ei 2en ) 

1 new C(2ei , . . . ,2e„) | (C)2eo | 2en OP 2ei | (2en?2ei : 2e^) 


e G Expression 


:= c 1 X 1 2eo.f | 2eo .m(2ei , . . . ,2e„) 

1 new C(2ei , . . . ,2e„) | (C)2eo | 2eo OP 2ei | (2eo?2ei : 2e2) 


OP G Operator 


:= + 1 - 1 * 1 / 1 < 1 > 1 == 1 M 1 II 


c G Constant 


:= true | false | 0 | 1 | -1 | ... 


C, D G Class-name, 


f G Field-name, m G Method-name, x G Variable 


Values that result from computation: 


V G Value 


:= c 1 object^ (vi , . . . , v„) | residual program part 
Fig. 5. 2EFJ syntax 



4 Two-Level Language 

Partial evaluation can be formalized as evaluation in a language with a two-level 
syntax ESI. The two-level separation of a program corresponds to a division 
of the program into static and dynamic parts. Since binding times are made 
syntactically explicit, specialization can be expressed straightforwardly using 
evaluation rules for the two-level syntax. We use this approach to formalize EF J 
specialization. 

We extend EF J into a two- level language by adding new constructs that rep- 
resent dynamic program parts, as shown in Figure 5; we name this language 
Two-Level EFJ (2EFJ). Static 2EFJ constructs are written as their EFJ coun- 
terparts, whereas dynamic constructs are underlined. Evaluation of a dynamic 
program part residualizes a specialized program part, so the domain of values is 
extended to include residual program parts. 

To permit a static expression to appear within a dynamic context, we add 
a lift expression. As is normally the case in partial evaluation, we only allow 
base-type values to be lifted. Lifting object values could be done by generating 
residual new-expressions, but doing so would duplicate computation, and would 
furthermore be problematic in most object-oriented languages since object iden- 
tity would not be preserved. 

We use monovariant binding times, which means that there is exactly one 
binding time associated with each program point, and that we assign the same 
binding time to all instances of a given class. Furthermore, we do not allow 
partially static data, so all fields of a given class have the same binding time. 
(We return to these restrictions in Section 0) We indicate the binding time of 
the objects of a given class by a binding-time annotation on the constructor of 
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the class. We refer to a class with a statically-annotated constructor as a static 
class, and similarly for a class with a dynamically-annotated constructor. 

For a method definition, the binding-time annotation on the class indicates 
the binding time of the self, the binding time of each formal parameter is indi- 
cated by its annotation, and the annotation on the return keyword indicates the 
binding time of the method return value. 



5 Well-Annotatedness 

We now define a set of rules that ensure 2EFJ well-annotatedness: a well- 
annotated (and well-typed) 2EFJ program either diverges, stops at the reduction 
of an illegal type cast, or reduces to a specialized program. In this section, we 
first discuss the relation between a class and its subclasses in terms of binding 
times, and then give a type system that defines well-annotatedness for 2EFJ 
programs. 



5.1 Binding Times and Inheritance 

The binding times of the classes of a program are influenced not only by how 
object instances are used in the program, but also by the inheritance relation 
between the classes. 

The binding time of two objects that are used at the same program point (a 
field lookup or method invocation expression) must be equal. We use monovari- 
ant binding times, so the classes of such two objects must have the same binding 
time. 

We use a type inferencing algorithm to predict the types of the objects that 
may be used at a given field access or method invocation, and thereby also to 
predict the control flow of the program. (Thus, the type inferencing algorithm 
can also be thought of as a control-flow analysis.) This type inferencing algo- 
rithm could in principle infer concrete types (i.e., a type more specific than the 
qualifying type given in the program); the more precise the type inferencing al- 
gorithm, the smaller the set of types at each program point, and thus the fewer 
restrictions there are on the binding time of each class. For simplicity, we com- 
pute the set of types using the EFJ type inference rules: for a given field access 
or method invocation, the set of possible types is the complete set of subtypes 
of the type inferred for the expression. Thus, a class that is used as the qualify- 
ing type of the self object in a field access or method invocation has the same 
binding time as its subclasses. Note that the class Object has neither fields nor 
methods, and thus never serves as the qualifying type in such expressions. More 
precise type annotations can be obtained by using a more precise type-inference 
algorithm, several of which are presented in literature |20l21l2:ilJ . 

In summary, the binding times of two classes are linked across a common 
superclass if an object qualified by this common superclass is the subject of a 
field access or method invocation. Had we used a more precise type inferencing 
algorithm, we would have had a different behavior. 
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-T h c : S (W-Const) 
_T h X : -T(x) (W-Var) 

rh e : S' 

type{e) = {Ci, . . . ,Cfe} 
Viel.-./c field-bt{Ci,f) = S 



r h e.f : S 

\fi £ l...n _T h Gi : S 
class-bt{C) = S 



F h new C(ei, . . . ,e„) : S 


The 


: S class-bt{C) = S 




rh (C)e : S 


F h 


ei : S r h 62 : S 


T h ei OP 62 : S' 




r h 60 : S 


F h 


61 ; r r h ei ; T 



r h (ep?ei; 62) : T 



(W-S-Field) 



(W-S-New) 



(W-S-Cast) 



(W-S-OP) 



(W-S-COND) 



r h e : S 

type{e) G {int, boolean} 
r h lift(e ) : D 

r\-e:D 

type{e) = {Ci, . . . ,Ck} 
\/i£l...k field-bt(Ci,f) = D 

TTTTTd 

Vz G 1. . n r ei : D 
class-bt{C) = D 
r h new C (ei . . . . ,e„) : D 

r e : D class-bt{C) = D 
r h (C)e : D 

P h ei ; H P h 02 : -D 
_r h ei OP 62 : -D 

Vi G {0,1,2} r^-ei-.D 

r h (eoleiie,) : D 



r e : S ViGl...u F \- e± : Ti iype(e) = {Ci , . . . ,0^} 
Vj G 1. . . fc bt-signature{Cj , m) = S.(Ti, . . . , T„) Tr 
r h e.m(ei,. . . ,e„) : Tr 

r \- e : D ViGl. . .n _T h ei : Ti i?/pe(e) = {Ci, . . . ,Cfe} 
V} G 1. . . A; bt-signature{Cj ,m) = D. (Ti , . . . ,T„) 1 -^ D 
r h eji(ei,. . . ,e„) : D 



(W-Lift) 



(W-D-Field) 



(W-D-New) 



(W-D-Cast) 



(W-D-OP) 



(W-D-Cond) 



(W-S-Invk) 



(W-D-Invk) 



Binding times BT: S, D. Binding-time environment F : Var ^ BT 

class-bt{C) = binding time of class C, field-bt{C, f) = binding time of field f in class C 

bt-signature{C, m) = binding-time signature of m in class C 

type: maps a 2EFJ expression into a set of types that includes the types of the 
values that may result from evaluating the expression 



Fig. 6. Rules for well-annotated expressions 



5.2 Well-Annotatedness Rules 

We define well-annotatedness of a 2EFJ program using the rules of Figures 
6 and 7. These rules are used to check that the binding-time annotation of each 
construct in the program is consistent with the annotations on the rest of the 
program. The rules have the form F \- e : T, meaning “in the environment F, 
the two-level expression e has binding time T.” The well-annotatedness rules are 
syntax directed, and use a number of auxiliary definitions; these definitions are 
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Methods: 



bt-signature{C, m) = To-{Ti, . . . , T„) S 
r = build- env{no-bt[P), To, (Ti, . . . , r„)) T h e : S 
T m(P) { return e; } OK in C 



(W-S-Method) 



bt-signature{C, m) = To.(Ti, . . . , T„) D 
r = build- env{no-bt{P), To, (Ti, . . . , T„)) T h e : D 
T m(P) { return e; } OK in C 



(W-D-Method) 



Classes: 



Vi £ 1. . ,p Mi OK in C 

class C extends D { Ci fi;...; C„ f„ 2K Mi ... Mp } OK 

Program: 

VCL e {CLi, . . . ,CL„} : CL OK Pq h e : T 
({CLi,...,CL„},e) OK in Po 

no-bt: maps a 2EFJ program part into the corresponding EFJ program part 
build- env: builds a binding-time environment from a list of formal parameters 
and a list of binding times to associate with these parameters 

Fig. 7. Rules for well-annotated methods, classes and programs 



summarized in the figure, and described in detail in the appendix. We use D to 
indicate a dynamic binding time and S to indicate a static binding time. 

The well-annotatedness rules for expressions (Figure 6) are defined as follows. 
The binding-time annotation of a lift expression is dynamic and its argument 
must be a static base-type value (w-Lift) . The binding-time annotation of a field 
access must correspond to the binding time of the field across all classes that 
may be used at this program point and is equal to the binding time of the 
classes that contain the field (w-S-Field and w-d-Field). The binding time of an 
object instantiation must be equal to the binding time of the class that is being 
instantiated (w-S-New and w-d-New). Similarly, the binding time of a cast must 
be equal to the binding time of the argument and the class that it is being 
cast to (w-S-Cast and w-d-Cast). For a method invocation, the binding time of 
the self object must be equal to the binding time of the classes of the possible 
receiver objects, and the binding times of the parameters must be equal to the 
binding times of the actual arguments. The well-annotatedness rules for the other 
constructs are straightforward, and will not be discussed. 

For a method declaration in a class C to be well- annotated, the binding time 
of its body must be equal to the binding-time annotation on the return statement 
(Figure 7, judgment “m OK in C”). The binding time of the body is checked using 
the well-annotatedness rules for expressions, in an environment defined by the 
binding-time annotations on the class and the method formal parameters. For 
a class C to be well-annotated, each method must be well-annotated (judgment 
“C OK”). Similarly, for a program P to be well- annotated in an environment /q 
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that provides binding times for any free variables, the main expression and each 
class must be well-annotated (judgment “P OK in /q”). 

6 Specialization 

The static parts of a 2EFJ program can be reduced away, leaving behind only 
the dynamic parts. Evaluation of a well-annotated 2EFJ program either diverges, 
stops at an illegal static type cast, or results in a specialized program. 



6.1 2EFJ Expression Evaluation 

Figure 8 shows the definition of 2EFJ expression evaluation. The evaluation 
rules have the form a, P e => (e', M), where a is an environment that maps 
variables to values (which may be evaluated program parts), P is a set of pending 
methods (methods that are currently being specialized), e is an expression that is 
specialized into e', and M is a set of new methods generated by the specialization 
of e. 

The static parts of a 2EFJ expression reduce into values using a set of rules 
that are counterparts to the standard EFJ evaluation rules of Figure 4, extended 
to pass the set of pending methods inwards and collect specialized methods. For 
example, the rule r-Cast of Figure 4 becomes 

CT, P h e (object(;;(vi, . . . ,v„), M) C <: D 

„ , , ; ^ (2RS-CAST) 

CT, P h (D)e > (objectc(vi, . . . ,v„), M) 

The 2EFJ counterpart of an EFJ evaluation rule R-x is named 2 RS-x (Reduce 
Static), giving the rules (2RS-Const), (2RS-Var), (2RS-New), (2RS-Field), (2RS-Invk), 
( 2 RS-Cast), ( 2RS-OP), and (2 RS-Cond). The evaluation rules for static 2EFJ expres- 
sions are straightforward except for method invocations; the rule is basically 
unchanged, but is used differently. A method invocation with a static self object 
but a dynamic return value will produce a residual expression that is unfolded 
into the calling context; any arguments, be they values or residual program parts, 
are substituted throughout the body of the method (2 RS-Invk). Note that since 
methods only are unfolded when the self is static, the unfolded body will contain 
no references to the fields of the self, and encapsulation is thus preserved. 

With the exception of method invocation, all evaluation rules for dynamic 
constructs are straightforward: each sub-component is reduced into a residual 
expression, and used to rebuild the construct. There are two rules for evaluating 
dynamic methods invocations. The first rule (2RD-Invk-Memo) handles the case 
where a specialized method that can be re-used is in the process of being gener- 
ated, meaning that it is contained in the set of pending methods. In this case, a 
call to this method is simply residualized. The function X used in the definition 
of this rule evaluates each argument and the self, and collects specialized meth- 
ods generated by this evaluation. In addition, it determines the indices of those 
formal parameters that have a static binding-time annotation, and those that 
have dynamic binding-time annotation. The second rule (2RD-Invk-New) handles 




Partial Evaluation for Class-Based Object-Oriented Languages 



185 



cT, P h e (v, M) 
c = residualize{v) 



(2R-LIFT) 



CT, P h X : 



CT, P h lift(e) => (c, M) 

Viel.,.n CT,Phei ^ (e',Mi) M' = [j.Mi 
new C (ei , . . . ,en) (new C(e'i, . . . , e(j), M') 



(x,0) 



(2RD-Field) 



<j, P h e 



(e',M) 



CT,Phe=^ (e',M) 

CT,P h e^=^ (e',M) CT, P h ^e ( (C) e' , M) 

Vi £{0,1,2} CT,Phei=^(e',M,) M' = [j^M^ 



CT,Ph ( 




((e 


'?e', 
0 1 




Vi £ {1,2} 


CT, P h ei 


^(e'. 


Mi) 


M' = U, M, 


eo 


OP ei = 


> (e'l OP 


62 , 


M') 


X{a, P, e^(di. 


= 


= ([di,... 


, 4 ] 


,C,Is,lD,e',M) 


(C,m, [d' i 


£ Js],!!!') 


£ P 


bi> 


...,Po] = Id 



CT, P h e^(di, . . . ,dfc) (e'.m'(dp^, ...,dp^),M) 

X{(T, P, e^(di, . . . ,dfe)) = ([d'l, , . . , d{] , C, Is, lD,e', M) 
-i3m' ; (C, m, [d'|i £ /s],m') £ P 
G(C,m,Js,Jg,[d'|l£ Js],P) = (m",M') \p^, . . . ,pa\ = Ip 

CT, P h eji(di, . . . ,dfe) (e'.m"(dpj, ...,dp^), M U M') 



(2RD-VAR) 



(2RD-NEW) 



(2RD-CAST) 



(2RD-COND) 



(2RD-OP) 



(2RD-Invk-Memo) 



2RD-Invk-New 



— Pending methods P: ({Ci, . . . ,Cq},m, [ei, . . . , e^], m') 

— Methods produced M : (C, M) 

— residnafee(v)=residual representation of base-type value v 

— Function X\ Given (ct, P, e^(di, . . . ,dfc)), return {E,C, Is, Id,^' , M), where the 
list E contains the arguments (di,...,dfc) evaluated, C is the set of possible 
classes of e, the list Is contains the indices of the static formal parameters of m, 
the list Id contains the indices of the dynamic formal parameters of m, e’ is e 
evaluated, and M is the set of new specialized methods generated by evaluation 
of e and (di, . . . ,dfe). 

— Function G: Given arguments as in rule (RS-Invk-New), return (m”, M), where m" 
is the name of a new, specialized method, and M contains all specialized versions 
of m” together with any specialized methods generated while specializing m". 

The auxiliary functions X and G are defined in Figure 9. 

List comprehension notation: [xi\i £ P]=list containing those Xi for which i E L, 

ordered as in L 



Fig. 8. Specialization of dynamic 2EFJ expressions 
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Viel...fc (7, P h di (d', Mi) M' = |J.Mi type{e) = {Ci, . . . ,Cq} 

Is = S-indices{Ci,m) Id = D-indices{Ci,m) cr, P h e (e',M") 

X{a, P,eM{di, ...,dk)) = ([d[, d'^.], {Ci, ... ,C^}, Is, Id, e', M' U M") 

m" = gensym![m) P' = {({Ci, . . . ,0^}, m, [d'l, . . . , d),], m”)} T = retum-type(Ci,m) 
\pi,...,Pa] = Id ^jel...q mbody{Cj,m) = {{x{,. . . ,xl),ej) <t' = [x^ d'|i £ Jg] 
a'j, P U P' \- ej (e), Mj) nij = {Cj, T , • • • , ^Pa){r6turn e^.; }) 

M’ = U^. M, ^ ' 

G({Ci, . . . ,Cq},m,Is, Id, [d' , . . . , d),], P) = (m", M' U {mi , . . . , nig}) 

S-indices(C, m)=list of indices of static formal parameters of method m of class C 
D-indices{C, m)=list of indices of dynamic formal parameters of method m of class C 
5 ensj/m/(m)=uniquely generated method name based on m 
return- type[C,m)=retVLiii type of method m in class C 

Fig. 9. Aioxiliary definitions for Figure 8 



the case where a set of new specialized methods must be generated. A set of 
specialized methods all of the same name are generated using the function G, 
and an invocation of a method of this name is residualized. 

The function G generates a new function name, and uses it together with 
the static evaluated arguments to extend the set of pending methods. The body 
of each potential receiver method is determined, and an environment that maps 
the static formal parameters to the corresponding arguments is constructed for 
each method. Each body is then evaluated, and used to construct a member in 
the set of specialized methods. 

The evaluation rules for method invocation can be improved in a number 
of ways. Let-blocks can be used to avoid code duplication when methods with 
dynamic formal parameters are unfolded (although let-blocks would have to 
be added to the language), and a cache can be introduced to avoid generating 
duplicate specialized methods. These problems and their solution are well-known 
from partial evaluation for functional languages, and will not be discussed. 

6.2 Evaluation of a Program 

Evaluation of a 2EFJ program produces a specialized main expression and a 
collection of specialized methods; this representation can be transformed into the 
aspect syntax of Figure 10a. We use introduction blocks to introduce specialized 
methods into classes, and a special main block to replace the main expression of 
a program. The rules for transforming the tuple resulting from 2EFJ evaluation 
into an aspect are shown in Figure 10b. The aspect produced by specialization 
can be woven into the main program using a simple weaver weave, defined by the 
evaluation rule of Figure 10c. The overall effect is that each specialized method 
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A G Aspect ;:= aspect N { main e } 

I G Introduction ;:= introduction C {Mi . . ,M„ } 

N G Aspect-name 

(a) Aspect syntax for program with main expression 

(To, 0 h e (e', M) CLi = class Ci extends Di { . . . ; K Mi ... Mfe} Vi G 1. . ,n 
{M^’j . . . ,M^’} = {m|(Ci,m) G M} A = introduction Ci { Mj’ ... M^’ } 

(To L ({CLi, . . . , CL„}, e) aspect {Ii ... In main e'} 

(b) Specialization into an aspect 

Ii = introduction Ci { Mj . . . M^ } CLi = class Ci extends Di { . . . ; K Mi ... Mp} 

CLj = class Ci extends Di { . . . ; K Mi . . . Mp Mj ... M'j,} Vi G 1. . .n 

weave{{{CLi, . . . , CLn}, e), aspect { Ii . . . I„ main e' }) > ({CLj, . . . , CLjj}, e') 

(c) Weaving of aspect and program 

Fig. 10. Specialization of a program into an aspect 



is inserted into the class for which it was specialized, and that the generic main 
expression is replaced by the specialized main expression. 

7 Binding-Time Analysis 

Binding-time analysis of an EFJ program constructs a well-annotated 2EFJ 
program. The binding-time analysis is supplied the binding times of the free 
variables of the main expression, and the derived annotations must respect the 
well-annotatedness rules and should make static as much of the program as is 
possible. We express the binding-time analysis as constraints on the binding 
times of the program, and then use a constraint solver to find a consistent solu- 
tion that assigns binding times to the program. 

7.1 Constraint System 

We generate one or more constraints for every program part. Constraint vari- 
ables are associated with expressions, classes, method returns, method formal 
parameters, and the free variables of the program main expression. A constraint 
variable Te constrains the binding time of an expression e, Tq constrains the 
binding time of a class C, and for a method m in the class C with formal parame- 
ters xi, . . . ,x„, Tc.m.Xj constrains the binding time of each method parameter and 
Tc.m. return constrains the binding time of the method return value (the binding 
time of the self argument is constrained by Tc). The constraint variable Tn.n.x 
constrains a free variable x of the main expression of the program. Apart from 
equality between binding times, we use the operator A to constrain a liftable 
expression, and the operator Ti > T 2 to express a dependency between T\ and 
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C®(C,m,e) = 


case e of 


[cl 


^ {Te = S} 


[xl 


^ {Te = Tc.n.n} 


[eo.fl 


=> {Te = Teg, Teg = Tc; } UC®(C, m, eo), type(e) = {Ci, . . . ,Cp} 


[new D(ei, . . 


• 5 ®n)]] ^ = 7 d, Tg ~ Td | UC'^(C, m, Gi) , Vz G 1. . .n 


P)eJ 


^ {Te =rei,Te = Td} UC^(C,m,ei) 


[eo OP ei] 


{L(re;,Tei),Te. = Tg } U C® (C, m, ei) , Vi e {0,1} 


[(ei?e 2 : ea)] 


^ , Te.^ ), Tgq [> Tei , T&q > Ts 2 , Tgi ~ Te2,Te = Te^ = T &2 \ 




UC^(C,m, ed,Vi £ {0,1,2} 


[eo.m(di, . . . 


dn)I ^ |L(Td^ , Td.^ ), Td.^ = Tcj.ni.yii^TeQ = Tcj,Te = .m. return } 




UC^(C,m,eo) UC®(C,m,di), 

type{eo) = {Ci, . . . ,Cp},Vi £ l...n,Vj £ l...p 


C"(C,m,e) = 


{I/(Te, Te), Tc.m.return = Te, To l> Tc.m.return} U {C, HI, e) 


C'^(class C extends D {. . . ; K Mi . . . M„}) = [J^ {C, mi, ei), 




Mi = T mi(. . .) { return e^; }, Vi £ 1. . .n 


C^({CLi,... 


CL„}, e, /) = / U C^(n, □, e) U (|Ji C^(CLi)), Vi £ 1. . .n 


Constraints: Ti l> Ta O (Ti = D ^ T 2 = D) 




S ~<D 


L{n,T.) = 


f Te ^ Te, type{e) £ {int, boolean} 
{ Te = Te, otherwise 




Fig. 11. Constraint generation 



T 2 , as defined in Figure 11. To express constraints on an expression e that is 
liftable, Tg is used to represent the binding time of the context of e, and the 
operator ^ is used to relate the binding time of Tg to that of Tg. 

The binding-time constraints for an expression e in a method m of the class 
C are generated using C^(C,m, e), defined in Figure 11. Constraint generation 
is straightforward given the well-annotatedness rules, except for the function 
L. The function L expresses that base- type values may be lifted, and is used 
to generate constraints for expressions that occur in a context where it may 
be useful to lift them. To generate constraints for a program, constraints are 
generated for all methods of all classes. 

7.2 Constraint Solving 

To efficiently solve the constraint system generated for an EFJ program, we 
can directly use the constraint solver of the C-Mix partial evaluator for C m 
We only use the operators =, >, and all of which are identical in C-Mix. 
Solving our constraint system does not generate new forms of binding times, 
even though the C-Mix constraint solver treats a richer set of binding times. 
The solution produced by the C-Mix constraint solver constrains all program 
parts that need to be annotated dynamic (e.g., Tg = D in the solution means e 
is annotated dynamic); all other program parts are assigned static binding time. 
A constraint Tg ^ Tg in the solution indicates that a lift should be inserted 
around the expression e. 
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8 Examples 

To illustrate how the partial evaluator presented in the previous sections can 
specialize object-oriented programs, we apply it to the power example presented 
in Section and to an example written using the visitor design pattern m- 

8.1 Power 

The power example of Section|2|was specialized informally in three different ways 
(as shown in Figures 1 and 2). The first specialization scenario, with a known 
exponent and unknown neutral values and binary operator, cannot be repro- 
duced by our partial evaluator since it requires partially static objects. We can 
however reproduce the second scenario (where the aspect Exp_Op_Neutral_Known 
shown in Figure 2 was produced). Given the main expression 

(new Power (il , 12 ,b ? (Binary)new AddO : (Binary)new Mul() )).raise(x) 

the binding-time analysis is done in an initial environment that annotates the 
free variables il, 12, and b as static and the free variable x as dynamic. All classes 
of the program are annotated dynamic, and hence all method invocations and 
field accesses are specialized away. The result of specializing is simply the aspect 

aspect Exp_Op_Neutral_Known { main x*x*x*l } 

In the last scenario (where the aspect BaseJtnown was produced), the base value 
(the variable x in the main expression above) is static, and all other free variables 
are dynamic. The binding-time analysis derives a program where all classes of 
the program are annotated dynamic, and the first argument to the e methods 
is annotated as static. The result of specializing is equivalent to the aspect 
Base_Known, except that the body of the method raise_2 is the specialized main 
expression. 

8.2 Visitor 

The visitor design pattern is a way of specifying an operation to be performed 
on the elements of an object structure externally to the classes that define this 
structure HD|. 

A sample implementation of a tree structure and two visitors is shown in 
Figure 12, using a slightly relaxed syntax. The class Tree is the abstract super- 
class that defines the interface for accepting visitors (of type TreeVisitor), and 
it has concrete subclasses Node and Leaf. The visitor CountOcc counts the number 
of occurrences of a given element in the tree, and the visitor FoldBinOp folds a 
binary operator (from Figure 1) over the nodes of the tree. 

We can specialize the program to a specific visitor type, as follows. We use 
an initial environment that defines the binding times of the free variables of the 
main expression shown in Figure 12: the tree t is dynamic, the booleans bl and 
b2 are static, and the integer i is static. The binding-time analysis infers that 
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class Tree { 

int accept (TreeVisitor v) { return this . accept (v) ; } 

} 

class Node extends Tree { 

Tree left, right; 

Node (Tree x,Tree y) { this.left=x; this .right=y ; } 
int accept (TreeVisitor v) { return v.visitNode(this) ; } 

} 

class Leaf extends Tree { 
int val; 

Leaf (int x) { this.val=x; } 

int accept (TreeVisitor t) { return t . visitLeaf (this) ; } 

} 

class TreeVisitor { 

int visitLeaf ( Leaf f ) { return this .visitLeaf (f) ; } 
int visitNode( Node n ) { return this . visitNode(n) ; } 

} 

class CountOcc extends TreeVisitor { 
int elm; 

CountOcc(int x) { this. elm = x; } 

int visitLeaf (Leaf 1) { return this . elm==l .val ? 1 : 0; } 
int visitNode(Node n) { 

return n.left . accept (this)+n. right . accept (this) ; 

} 

} 

class FoldBinDp extends TreeVisitor { 

BinDp op; 

FoldBin0p(Bin0p x) { this. op = x; } 
int visitLeaf (Leaf 1) { return l.val; } 
int visitNode( Node n ) { 

return op . e(n. left . accept (this) ,n. right . accept (this) ) ; 

} 

} 

t.accept( hi ? (TreeVisitor)new CountDcc(i) 

: (TreeVisitor)new FoldBin0p( b2 ? new Add() 

: new Mul() )) 

Fig. 12. Source code for tree structure and a few visitors 



the class Tree and its subclasses are dynamic, that the classes TreeVisitor and 
Binary and their subclasses are static, and that all return values and operator 
applications are dynamic. Specializing the program with the variable hi as true 
and the variable i as 2 yields the aspect Count _2 shown in Figure 13. Conversely, 
specializing the program with the variables hi and b2 as false yields the aspect 
FoldJtul (again shown in Figure 13). In both cases the extra virtual dispatch 
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aspect Count_2 { 
introduction Leaf { 

int acceptlO { return this.val==2 ? 1 : 0; } 

} 

introduction Node { 

int acceptlO { return this . left . acceptl ()+this .right . acceptl () ; } 

} 

main t. acceptlO 

} 

aspect Fold_Mul { 
introduction Leaf { 
int accept2() { return this.val; } 

} 

introduction Node { 

int accept2() { return this . left . accept2() +this .right . accept2() ; } 

} 

main t.accept2() 



Fig. 13. Specializations of program of Figure 12 



needed to select the visitor has been removed, and the implementation of the 
visitor unfolded into the accept methods. 



9 Scaling Up to Realistic Java Programs 

The EF J partial evaluator presented in this paper can be extended using existing 
techniques to specialize realistic Java programs in a useful way. In this section, 
we first discuss the needed extensions, and then describe a complete partial 
evaluator for Java implemented according to these guidelines. 



9.1 Improving the EFJ Partial Evaluator 

Perhaps the most obvious extension to the partial evaluator is the treatment 
of side-effects. Such an extension is straightforward, since existing techniques 
from first-order imperative programs (such as the C language) can be used: in 
our analysis and specialization framework we essentially treat a virtual dispatch 
as a conditional that selects the receiver method based on the type of the re- 
ceiver object, and thus no higher-order functions are needed. The challenge lies 
not in dealing with side-effects, but in defining a binding-time analysis that is 
sufficiently precise for specializing realistic programs. 

To scale up partial evaluation to realistic programs, we must take into ac- 
count the patterns of programming often found in object-oriented languages. In 
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a large program, different instances of the same class are often used for different 
purposes, and will thus often need to be assigned different binding times. Thus, 
each instance of a given class must be assigned a binding time independently of 
the other instances of this class. Since an object is usually manipulated through 
its methods, each invocation of these methods must also be assigned binding 
times individually. Thus, both class-polyvariance (individual treatment of the 
instances of each class) and method-polyvariance are needed. In a language with 
constructors, these must be treated polyvariantly as well. In a language with 
side-effects, an alias analysis is needed to determine the set of locations manip- 
ulated at each program point. The precision of the alias analysis must match 
the precision of the binding-time analysis, which means that it too needs to be 
class-poly variant and method-poly variant. Furthermore, it is essential to permit 
partially static objects, which for example can be done with use-sensitivity H3|. 

There are no formalizations of class and method-polyvariant binding-time 
analysis with use-sensitive binding times for languages with side-effects. Never- 
theless, such a binding-time analysis has been implemented in the Tempo partial 
evaluator for the C language and is as such “known technology” (the binding- 
time is polyvariant across C structure instances, which is equivalent to class poly- 
variance) . Polyvariant alias analyses have been studied for imperative languages 
both formally and in practice, and are well-documented in literature m 



9.2 Partial Evaluation for Java 

We have implemented a complete partial evaluator for Java; this partial eval- 
uator is based on the principles presented in this paper, extended (mostly) as 
described in Section 19.11 to support larger and more realistic programs 
Our partial evaluator, named JSpec, treats the entire Java language excluding 
exception handlers, although with restrictions on the more exotic features of 
Java, such a multi-threading and dynamic loading. JSpec has been shown to 
give significant speedups when applied to large programs written in Java, across 
different machine architectures and execution environments P32S1. However, 
specialization of large programs written in Java is difficult in practice, due to 
the complexity of obtaining a satisfactory binding-time division; we are looking 
to specialization classes m and specialization patterns to help guide the 
specialization process. 



10 Related Work 

Marquard and Steensgaard have demonstrated the feasibility of on-line partial 
evaluation for object-oriented languages m- They developed a partial evaluator 
for a small object-based object-oriented language based on Emerald. However, 
the primary focus is on issues in on-line partial evaluation, such as termination 
and resource consumption during specialization. There is no description of how 
partial evaluation should specialize an object-oriented program, and virtually 
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no description of how their partial evaluator handles object-oriented language 
features. 

Partial evaluation can be done based on constructor parameters at run time 
for C-|— I- programs, as shown by Fujinami [Oj. Annotations are used to indi- 
cate member methods that are to be specialized. A method is specialized using 
standard partial evaluation techniques for C and by replacing virtual dispatches 
through static object references by direct method invocations. Furthermore, if 
such a method has been tagged as inline, it is inlined into the caller method and 
further specialized. This approach to partial evaluation for an object-oriented 
language concentrates on specializing individual objects. On the contrary, we 
specialize the interaction that takes place between multiple objects based on 
their respective state, resulting in global specialization of the program. 

Veldhuizen has demonstrated that templates in C-I--I- can be used to perform 
partial evaluation at compile time m By using a combination of template 
parameters and C-|— I- const constant declarations, arbitrary computations over 
base type values can be performed at compile time. Nevertheless, specialization 
with C-| — h templates is limited in a number of ways: the values that can be 
manipulated are restricted, the computations that can be simplified are limited, 
and an explicit two-level syntax must be used to write programs. As a conse- 
quence of this last limitation, binding-time analysis must be performed manually, 
and functionality must be implemented twice if both a generic and a specialized 
behavior is needed. 

Customization and selective argument specialization are highly aggressive yet 
general-purpose object-oriented compiler optimizations m Selective argument 
specialization (the more general of the two optimization techniques) specializes 
methods to known type information about their arguments. Specialization is 
done by eliminating virtual dispatches over objects with known types, similar 
to partial evaluation. However, there is no dependence on static information, 
since type information and execution time information is dynamically gathered 
to control optimizations. Compared to these optimizations, partial evaluation for 
object-oriented languages is more thorough and more aggressive, but also less 
general: it propagates values of any type globally throughout the program and 
reduces any computation that depends only on known information, but does 
not optimize program parts where no static information is available. In fact, 
customization and selective argument specialization are often complementary to 
partial evaluation, since they can be used to optimize program parts of a more 
dynamic nature, where no static information is known. 

11 Conclusion and Future Work 

Given the widespread popularity of object-oriented languages and the perfor- 
mance problems associated with frequent use of object-oriented abstractions, we 
expect that partial evaluation can be a useful software engineering tool when 
implementing object-oriented software. In this paper, we have given a formal 
definition of partial evaluation for a minimal class-based object-oriented Ian- 
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guage, and thus made clear how partial evaluation can specialize object-oriented 
programs. Furthermore, we have described how this minimal partial evaluator 
can be extended using known partial evaluation techniques to specialize realistic 
object-oriented programs. 

We leave as future work the formal proof of correctness of our partial eval- 
uator. Also, we have concentrated on class-based object-oriented languages. 
Nonetheless, we consider object-based languages to be an interesting target for 
partial evaluation, and are working on giving a concise definition of partial eval- 
uation for such languages. 
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Class table lookup: 






({CLi, . . . ,CLti}, e) is current program 
class C extends D { . . . } G {CLi , . . . ,CL^ } 


Method body lookup: 


CT(C) = class C extends D { 

Field lookup: 




C'T(C) = class C extends D { . . .Mi . . .M„. } 
C ni(Ci Xi, . . . ,Ck Xic) { return e; } 
mbody{m, C) = ((xi , . . . ,Xfc), e) 


/ieZ(is(0bject) = e 

CT(C) = class C extends D 

{ Cl . 

fields{D) = Di gi, . . . ,D^ 


■} 


C'T(C) = class C extends D { . . .Mi . . .Mn } 
m is not defined in Mi , . . . ,Mti 
m6ody(m, C) = mbody{m., D) 


fields{C) = Di gi, . . . ,D„ g„, Cl fi, . 


■ ,Crr fn 




Fig. 14. EFJ auxiliary definitions 
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A Auxiliary Definitions 

The function type is used throughout the paper to map an expression into a set 
of types that includes the types of the values that may result from evaluating 
the expression. To implement this function, we use the EFJ type-inferencing 
rules m in a pre-processing pass, and annotate each expression with its type 
and (when this type is an object type) all of its sub-types. The 2EFJ evaluation 
rules exploit the fact that the qualifying type of the expression is included in the 
set of types returned by the function. If this were not the case, an illegal call to 
a specialized virtual method might be generated for a dynamic virtual dispatch, 
since the virtual method must be declared in the qualifying type. Thus, if a more 
precise type-inference algorithm was used, the qualifying type would have to be 
explicitly inserted into the set of classes returned by the function. 

The definitions in Figure 14 are used to extract information from the pro- 
gram; they are used throughout the paper. The function CT maps a class name 
to its definition, the function fields maps a class name to a list of its fields, the 
function mbody maps a method name and a class name to the formal parameters 
and body of this method. As is the case for the original FJ presentation, we 
have chosen the notion of a “current program” to avoid threading the program 
definition through all rules. 
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no-bt{e) = e with all binding-time annotations removed 

build - , . . . , Xn), Tq, {T\ , . . . , Tn)) = [this i— > Tq , xi Ti , . . . , Xn Tn\ 

CT{C) = class C extends D ; K . . .} CT(C) = class C extends D ; K . . .} 
class-bt{C) = S class-bt{C) = D 

C7T(C) = class C extends D { . . .Mi . . .M^ } 
class-bt{C) = T m not defined in Mi, . . . ,Mfc 

field-bt{C,f) = T bt-signature{C,m) = bt-signature{D,m) 

C'T(C) = class C extends D { . . .Mi . . .Mfc } 

Mi = C m(P) { return e } ^0 = class-bt{C) Vf € 1- . . arity{C,m) T[ = param-bt{^i{P)) 

bt-signature{C, m) = Tq .(T^ j ■ ■ ■ , ^ S 

C'T(C) = class C extends D { . . .Mi . . .Mfc } 

Mi = C m(P) { return e } ^0 = class-bt{C) Vf € 1- . . arity{C,m) = param-bt{^i{P)) 

bt-signature{C, m) = Tq.(T^ j ■ ■ ■ j D 

mbody{C,m) = (P, e) 

S-indices{C,ja) = [f G 1 . . . arity(C,m)\param-bt{i^i{P)) = 5] 
mbody{C,m) = (P, e) 

D-indices{C,ja) = [i G 1 . . . arity(C.m)\param-bt{i^i{P)) = D] 
param-bt{x) = S param-bt{x,T) = D 

mbody{C, m) = (P, e) (xi , . . . ,x„) = no-6£(P) 
arity(C. m) = n 

Fig. 15. Auxiliary definitions for Figures 6, 7 and 9 



Figure 15 defines the auxiliary definitions used in Figures 6, 7, and 9. The 
function no-ht removes all binding-time annotations from an expression. The 
function build- env builds a type environment for analysis of a method. The func- 
tion class-bt returns the binding-time of a class, and the function field-bt returns 
the binding-time of a given field of a class. The function bt-signature returns 
the binding-time signature of a method. The functions S-indices and D-indices 
return lists of indices of method formal parameters that have static (S-indices) 
and dynamic (D-indices) binding time. Last, the functions param-bt and arity 
are auxiliary function used in this figure. 
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Abstract. Collapsed jungle evaluation is an evaluation strategy for func- 
tional programs that can give super-linear speedups compared to con- 
ventional evaluation strategies such as call-by-need. However, the former 
strategy may incur administrative evaluation overhead. We demonstrate 
how this overhead can be eliminated by transforming the program using a 
variation of positive supercompilation in which the transformation strat- 
egy is based on collapsed jungle evaluation. In penetrating the constant- 
factor barrier, we seem to be close to establishing a transformation tech- 
nique that guarantees the efficiency of the transformed program. As a 
spin-off, we clarify the relationship between call-by-name, call-by-need 
and collapsed-jungle evaluation, showing that all three can be expressed 
as instances of a common semantics in which the variations — differing 
only in efficiency — are obtained by varying the degree of sharing in a 
DAG representation. 



1 Introduction 

Jungle evaluation has arisen from the graph grammar community as a means 
of speeding up evaluation of term rewrite systems, as described by Habel, Hoff- 
mann, Kreowski and Plump mm In short, a jungle is a directed acyclic graph 
with explicit addresses of nodes. It has been shown that the naive implementa- 
tion of a function calculating Fibonacci numbers can be made to run in linear 
time by using evaluation on fully -eollapsed jungles 0 . This kind of evaluation is 
achieved by never allocating new vertices if identical vertices are present in the 
graph. 

The fully-collapsed-jungle approach has the drawback that it can be some- 
what expensive to administer the graph. In this paper we will show how we can 
remove the run-time overhead of the above implementation technique by shifting 
the use of fully-collapsed jungles from run time to compile time. Specifically, we 
will do program transformation on fully-collapsed jungles instead of trees. The 
result is that we pay for the overhead once and for all, not every time a program 
is executed. 

A spin-off of this approach is that we present a unified formalism for graph 
reduction, encompassing call-hy-name^ call-by-need, and collapsed-jungle evalu- 
ation (of which only the former has been omitted in this paper due to space 
restrictions). We will show that these three reduction strategies can be captured 
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in full by two abstract operations on graphs. We will do this by presenting a 
programming-language semantics parameterised over these two abstract opera- 
tions; we will show that any implementation of these two abstract operations 
will result in the same semantics, as long as the implementation fulfils reason- 
able criteria. The distinction between any two implementations of the abstract 
operations is then only one of efficiency. This clarification seems interesting in 
its own right, since it makes it possible to compare variations of graph reduction. 



2 Notation 

We will use a notation that is somewhat non-standard, and an explanation is 
thus in order. When we write, say “a bcd\ you should read this as ((a b) c) d. 
If a is a function, then abed means the result (if any) of ((a b) c) d. If a is an 
uninterpreted symbol (a constructor), then such an application means the term 
(i.e., tree) consisting a root labelled a and ordered children b, c, and d. 

We let {abed} denote the set containing the objects a, b, c, and d; we 
let (abed) denote the tuple containing these four objects; and we let [abcd\ 
denote the list containing these objects. We use parentheses only as meta-syntax 
to group objects, that is, to avoid ambiguous interpretations. Thus {{abed)} 
denotes a singleton set (the element being whatever abed means). We use U, -I-, 
and \ as infix operators on sets to denote union, disjoint union, and subtraction, 
respectively. Given a set S, the power set of S is denoted ^{S)] the set of finite 
lists of elements from S is denoted S*. 

We will often need to write sequences such as xi X2 X3 X4 X5, and we will 
therefore introduce the shorthand notation (x.)® for such a sequence: The super- 
script “ 5 ” denotes that the preceding syntactic object “x.” should be replicated 
five times, with the dot replaced by the consecutive numbers 1 , 2 , 3 , 4 , 5 . If 
the replicated object is syntactically simple, we will leave out the dot all to- 
gether, and, for example, write x" instead of (x.)". When this kind of notation 
is used in several layers, the innermost part is expanded first. Hence {(x. i-^- t^)”} 
means {{x\ 1-^ ^2) • • • (xn-i '— *■ G ^2) (xn G ^2)}; the empty sequence is also 

allowed, so {(x. 1-^ b)°} means {} = 0 . 

For a relation ^ C S xT, we define the domain &{^) = {s | : s ^ t}. We 

say that ^ is deterministic if, for all s G S', s ^ t and s —r t' imply t = t' , that 
is, ^ is a (partial) function. To denote that / is a (partial) function, we write 
/ G S ^ T. If the domain of / is finite, we will use {(s. i-^- 1 .)"} to denote the 
set of bindings that / comprises. If ^ is a binary relation S x S, we denote by 
^ the transitive closure of —?■, and by ^ the reflexive closure of The normal 
forms of ^ is the set S{-^) = S \ ^{^)- 

3 Subject Language 



Our programming language is a small, non-strict, first-order, function-oriented 
language with structured data and pattern matching. 
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Example 1 (Fibonacci Numbers). The following program defines the well-known 
Fibonacci function: 



data Nat = 0 | S Nat 
BbO = SO 
Gb{sx) = aux X 
aux 0 = s 0 



aux (s y) 
add 0 y 
add (s x) y 



add (aux y) (Bb y) 

y 

S (add X y) □ 



Remark 2. The above program contains a data-type definition. For clarity, we 
will put such data-type definitions in our example programs, even though such 
data-type definitions are not permitted in the language. □ 



3.1 Syntax 

Definition 3. Assume denumerable disjoint sets of symbols for constructors C, 
functions E, pattern functions Q (ranged over by c, /, and g, respectively), and 
variables X (ranged over by x,y). Then the set of programs Q, definitions T>, 
terms T, and patterns V (ranged over by q, d, t, and p, respectively) are defined 
by the abstract syntax grammar 

(program) Qb q ::= d”^ (definitions) 

(definition) T> B d ::= f = t \ (g p. = t.)™ (function/matcher) 
(pattern) V Bp ::= cx” (fiat pattern) 

(term) T B t ::= x | c t” | / t” | p 

( variable/construction/application/match) 

(value) V 9 X ::= cv” 
where n > 0 and m > 0. We require that 

1. No (pattern) function name is defined more than once. 

2. No two patterns in a matcher definition contain the same constructor. 

3. No variable occurs more than once in the left-hand side of a function defini- 
tion (the definition is left-linear). 

4. All variables in the body of a function definition are present among the 
variables in the left-hand side of the definition. 

We let / x" = t denote that the program q contains a definition f x"' = t, and 
similarly for pp x" = t. As a shorthand, we let £ = CUiFUQ. The set of variables 
in a term t is denoted F (f), a term t is a ground term if F(t) = 0. □ 

We will shortly define the meaning of our little language in terms of a (pa- 
rameterised) small-step operational semantics. The idea is that, given a ground 
term t, we can determine which ground term t' (if any) that t reduces to in one 
step. The usual way to express such an reduction step is to define a relation on 
terms via substitutions. 

Definition 4 (Substitution, renaming). Given a function if = {(x. i-^- 1.)"} G 
X ^ T from variables to terms, we denote by 6 = {(x. i-^- t.)"} G T ^ T the 
substitution induced by if. □ 
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3.2 Graphs 

In this section we will investigate the interaction between various alternative 
representations of terms. We therefore formally introduce a more general repre- 
sentation of terms, namely directed acyclic graphs (dags). 

Definition 5 (Graphs). Let s range over a finite set of symbols S, and let a 
and j3 range over a set of addresses A. We then define the following: 

1. We denote by nodes over S the set A^{S) = S x A*, ranged over by v. We 
will use the shorter notation = sho" instead of writing v = {s [o"]). 

2 . We denote by directed graphs over S the set A (A^{S) + A) of partial 
mappings from addresses to nodes/ addresses. Any directed graph G induces 
a binary relation —oCAxA defined hy (3 ^ at iS G P = at or G (3 = 
SHo" A i < n. 

3. We denote by acyclic directed graphs over S the set i^(S') Q A^ {A^{S)+A) 

of acyclic graphs (i.e., G acyclic iff Ja G 3i{G) : a ^ a). We let V range 
over the set of acyclic graphs. 

4. We denote by configurations over S the set JG{S) = i^{S) x A, ranged over 
by K. 

5. If an address is mapped to another address (i.e., not to a node), we call both 
the former address and the mapping an indirection, and we define 

||V|1 ={aH^/3| Va = /JG A} . 

The relation C Af (£) x A is defined inductively by 

(Va) a , if a ^ ^(l|V||) 

(V a) a' , if II V|| a = a” and (V a”) a' . 

6 . Let A be a set of addresses. We let V/a denote the graph V restricted to 
i?(V)\ A. 

7. We let FRESH be a procedure that provides us with a completely new address 

each time it is called. We implicitly assume that every address mentioned 
has been drawn by this procedure; thus fresh will provide addresses that 
cannot be captured. □ 

Example 6 ( DAG /. Let {0 S add fib aux} be a set of symbols and {aj 6 e} a 
set of addresses, and consider the graph 

' a add-\j 6 
_ 71 -^ aux He 

^=LAfibH, 

Then i^(V) = {ay 5 /?}, || V|| = {/3 J}, (V /3) <5 and e.g., (V e) e. □ 
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From the above example, you can see that we intend to use the set £ = 
CUTUQ as the set of symbols that our graphs are defined over. We will now clarify 
the correspondence between such graphs and the terms of our little language. It 
is fairly straightforward to extract a term from a graph: 

Definition 7 (Extract). Let (f> G A ^ X he a, unique bijection from addresses 
to variables. By (pa we denote the variable 4> a. The function xtract G {£) ^ T 
is defined inductively by 

xtract (V a) =</)„, if a ^ ^(V) 

xtract ((V + {a /3}) a) = xtract (V /3) 

xtract ((V + {a 1 -^ sh/3"}) a) = s {xtract (V /3.))" □ 

Example 8 (Extraction) . Assume pe = x, and consider the graph V from Ex. El 
we have that xtract (V f3) = fib x and xtract (V a) = add (aux x) (fib x) . □ 

The opposite translation — from terms to graphs — however, depends on 
how much sharing of sub-terms we want to have. Furthermore, the precise for- 
mulation of operations on the graph representation of a term will depend on 
how far we are willing to go to maintain sharing of sub-terms. To abstract away 
from such preferences, we define two basic operations on our graphs. The first 
operation, upd, has the signature upd G i^{£) ^ Xh{£) i^(f ) ■ It takes 

as arguments a graph V, a target address a, and a node v] Intuitively, calling 
upd is like placing a piece in a jigsaw puzzle: the pockets in the new piece are 
attached to the tabs of the surrounding puzzle, and likewise the tabs of the piece 
are fitted into the surrounding pockets; more concretely, upd updates V with v 
at a such that it connects with existing nodes in V. Formally, it must hold that 

V' = upd V a (sHtt") 

implies ([]) 

V/3 G A : xtract (V' (3) = {[<pa ^ s {xtract (V a.))"}) {xtract (V /3)) , 

provided a ^ i^(V). The above says that we should be able to extract the same 
terms from the updated graph, except that the variable pa bas been replaced 
with s {xtract (V a.))". A straightforward implementation obeying this rule is 
upd Vaiy = V + {a I— >■ v). 

The second operation, subst, has the signature subst G ^{£) ^ A ^ T ^ 
'E — > i^{£), where E = X ^ A are partial functions from variables to addresses. 
Function subst takes as arguments a graph V, a target address a, a term t, and 
a mapping ip (from the free variables in t to addresses in V). To stick with the 
jigsaw-puzzle analogy, the effect of subst is to cut a picture into a collection of 
pieces, and then connect this collection of pieces to the existing puzzle; from 
t, a collection of nodes is made such that the variables of t are connected to 
existing nodes, and the root of t is placed at address a. As the name suggests, 
this operation is used to perform what corresponds to substitution in the world 
of terms. Formally, it must hold that 
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V' = subst V at Ip 
implies 

V/3 G &(y) U {a} : xtract (V' (3) = 0 (xtract (V /?)) (jj) 

where 

= \(pa {a; xtract (V {-ip x)) | a; G '^{t)J tj , 

provided a ^ i^(V). The above says that we only allow extensions to graphs, 
that is, we require that, as long as we stay inside the domain of the graph as it 
were, the same terms will be extracted after the operation, except that, instead of 
address a materialising into a variable a will now materialise into t with the 
free variables replaced. Note that the operation might have added more than just 
a to the domain of the graph. See Fig.^for a straightforward implementation. 

Example 9. Take the graph V from Ex.El and let Vq = shown below to 

the left. Two legal results w.r.t. (j||) of suhst Vq (3 (s (fibx)) {x ^ e} are shown 
below to the right. 





In the rest of this section we will be concerned with properties that are 
independent of the particular implementation of upd and subst. We will therefore 
talk about families of operations and functions indexed by such implementations. 

Definition 10 (Realisation). A realisation I is an implementation of upd 
and subst, denoted updj and substj, such that updj satisfies 0 and substj 
satisfies 0 . □ 

3.3 Semantics 

As promised, we can now present a semantics for our little language. The se- 
mantics is a small-step operational semantics (Plotkin style P3|) parametrised 
by a realisation. First we need to translate the initial ground term into graph 
representation. 

Definition 11 (Initial configuration). Given a realisation I, the function 
initi G T ^ is defined as initj t = {{subst j 0 ao t 0) ao) where oq = 

FRESH. □ 

That is, a new graph is built (on top of an empty graph) such that it repre- 
sents the initial ground term. Since t is a ground term, it contains no variables, 
and thus the variable-to-address mapping is empty. 
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Theorem 12. For all realisations I and ground terms t, xtract (initj t) = t . 



Definition 13 ( — >). Given a program q and a realisation I, the binary relation 

ql 

— > C JF{S) X JF{S) is defined as the smallest relation satisfying the inference 

ql 

system 



(apply) 



f = t V' = subst Vat {{x. «•)"} 

((V + {a /ho"}) a) ^ (V' a) 



(V a) ^ {V a) 



(select) 



(Vao)-^a'o Va'o = CH/J™ g{cx”^)y^ = t 
V = subst V a t {{x. ^ {y. ^ a.)"} 

((V + {a a"}) a) ^ (V' a) 



/ X (V Op) g'o 
^ ((V + {a^5Haoa”})a) 



(Va(,)^ (V' g'o) 

((upd V' a ((/HOo ct")) a) 



(trans) 



(V«) 

(Va) 



(V'«) 

(V'a) 



(V a) ^ (V' a) 



(const) 



m < n \/i < m : xtract (V Oj) G V (V am) — > (V' Om) 
((V + {a CHO;"}) a) ^ {{upd V' a (chq;")) a) 



(skip) 



(V a) a' (V a') ^ (V' o') 
(V a) — > (V^ 




The subscript q ■ I has been omitted to avoid clutter. □ 

The inference rules define three relations: and — >. The relation 

^ relates a configuration containing a function call to the result of the call. 
Operationally, you can read the rule apply as “replace the function call with the 
function body, in which the variables have been replaced by the arguments to 
the function”; the subst function takes care of both the substitution and the 
translation from term to graph representation. The select and dive rules take 
care of pattern-matching functions: In the former case, the first argument to the 
function has an outermost constructor, and therefore the call can be replaced 
by the body of the matching function (similar to the apply rule). In the latter 
case, the first argument to the function does itself contain something that can be 
rewritten by a single step (i.e., a function call); this one-step rewrite is performed, 
and the result of this rewrite is written back into the graph by an upd call. The 







Driving in the Jungle 205 



mutually recursive relations ^ and — s- “dig into” the graph to locate the next 
function call to reduce. The skip rule simply skips over indirections and passes 
control to the trans or const rules, of which the former simply passes control 
to the above-mentioned ^ relation, which means that a function call has been 
located. If a function call has not been located — that is, as long as there are 
only constructors to the left of the current address — the const rule digs into 
the leftmost subgraph that can be rewritten (i.e., contains a function call); the 
rewrite is performed, and the result is written back into the graph by an upd 
call. The xtract (V a^) € V part of the premise for const ensures that no rewrites 
are possible to the left of am- The relation — > is thus responsible for reducing 
the graph from left to right. 

Theorem 14. The relation — > is well-defined and deterministic. 

Definition 15 (Evaluation). Given a realisation /, we define the function 
evalq.j G T —>■ ^(V) by 

evalq.i t = {u I {initi t)—^K G S'{ — >) A {xtract k) = v G V} . □ 

q-I q-I 

We will now state two important properties about the semantics of our little 
language. The first is a direct consequence of the relation — > being deterministic. 

Corollary 16. A ground term evaluates to at most one value. 

A ground term may fail to evaluate to a value for two reasons: Either the 
computation consists of an infinite number of steps, or the computation “gets 
stuck” at some point. The former reason is usually called non-termination and 
is an inherent unpleasantness in any universal programming language. We can, 
however, circumvent the latter situation by imposing a standard polymorphic 
type system on our language to reject program/term pairs that will get stuck in 
a normal form that is not a value. We will not pursue this matter further in this 
paper, but simply assume that all programs and terms are type correct. 

Definition 17 (Correct). Given program q and ground term t, we say that 
the pair (qt) is eorreet if {initj t)—^K G S { — >) implies {xtract k) G V, for all 
realisations □ 

The second property — and the main reason for the preceding rigor — is 
that the evaluation of a term (w.r.t. a particular program) always gives us the 
same result for any realisation. 

Theorem 18 (Realisation independence). For any program q and ground 
term t, if the pair {q t) is correct, then evalq.A t = evalq.B t for any two realisa- 
tions A and B. 

Example 19. Gonsider the program q in Ex.^ The pair consisting of q and 
term fib (s (s (s (s 0)))) is correct and evaluates to S (s (s (s (s 0)))). The pair 
consisting of q and term fib (s a) is not correct, since evaluation gets stuck in a 
configuration representing the term aux A. □ 
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4 Graph Machinery 

In this section we will present two implementations of subst and upd. The two 
implementations differ in how they handle sharing of identical subterms. 

4.1 Call-by-Need 

The most straightforward of these implementations is shown in Fig.^ The ex- 
planation of the implementation is as follows. 

The upd function simply adds a node to the graph. The subst function calls 
the auxiliary function saux to ensure that t is converted into graph representa- 
tion. If the resulting address a' of this conversion is different from the preferred 
target a, an indirection is made from a to o! . The auxiliary function saux 
converts a term t into graph representation such that the variables in t are con- 
verted into existing addresses in the graph, thus possibly introducing sharing 
of subgraphs. More precisely, saux adds new nodes to the graph by recursively 
decomposing t: If t has s as root and n subterms, n fresh addresses are chosen 
and fed to recursive calls to saux (thus ensuring that the n subterms have been 
converted into graph representation), and a new node labelled s is created. If t 
is a variable x, however, no new node is created; instead the provided mapping 
ij} tells us which existing address to “substitute” for x. The address representing 
t can thus be different from the preferred target a. 



subst € ^ T ^ tp ^ '^(S) 

subst V atip = let (V' a') = saux V a 1 in if a = a' then V' else V -I- {a i— > a'} 
saux € '^(£) je-{£) 

saux Vo atip = M t £ X then (Vo {ip t)) 

else let st" = t-, ({V. a.) = saux V._i fresh t. 
in ((V„ -I- {a SHo"}) a) 
upd € tg{£) ^A^ J^{£) <^{£) 

upd V a ly =W X {a ^ v} 



Fig. 1. DAG representation of terms (call-by-need)G 



It is not hard to see that the implementation in Fig.^provides the basis for 
the standard notion of call-by-need: when used in conjunction with the inference 
rules defined in Def. uni all occurrences of the same variable (in a function body) 
share the same subgraph. When it is necessary to reduce one of these occurrences, 
all the other occurrences will share the rewrite performed by the inference rules. 

4.2 Collapsed Jungle 

The implementation presented in Fig.Qis far more interesting, since it will main- 
tain as much sharing as possible. The upd function will never add a new node if 
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there already exists a similar node. That is, a new node v will not be added 
to the graph V at address a if there already exists a node at /3 such that 
xtract (V (3) = xtract ((V + {a i— > i'}} a). In case such a (3 exists, we only add 
an indirection from a to /3 to the graph. Furthermore, after adding something to 
the graph (be it a node or an indirection), the resulting graph is collapsed such 
that multiple occurrences of similar nodes (in the above sense) are eliminated, 
and all nodes will have nodes (not indirections) as descendants. Similarly, the 
subst function will make sure that superfluous nodes are not added to the graph. 
subst will call the auxiliary function saux to convert the term into graph repre- 
sentation, and if the resulting address is different from the preferred target, an 
indirection is created, and the (collapsed) graph is returned. The auxiliary func- 
tion saux works like in the call-by-need case, except that a node is not created 
if a similar one exists. 



subst £'IS{£) ^ A ^ T tp ^ (£;) 

subst V at i/j = let (V' a') = saux \7 at 'tp 

in if a' = a then V' else collapse (V' -I- {a i— > a'}) 
saux € ^(£) ^ A T — >®’(£')x^ 

saux Vo a tip = ii t ^ X then let (Vo {tp t)) a' in (Vo a') 

else let st^ — t\ ((V. a.) = saux V._i fresh t. tp)" 

in if 3/3 e ^(V„) : V„ /3 = A ((V„ /3.) a.)" 

then (Vn (3) else {(V„ 3- {a sho"}) a) 
upd € ^^(f) ^A^ A^(£) ^(£) 

upd V a (sHo") ^ let ({V a.) a')" in if 3/3 : V /3 = sh/ 31* A ({V /3.) a')" 

then collapse (V 3- {a i— > /3}) 
else collapse (V 3- {a i— > SH(a')"}) 

collapse £ ^(£) —> ^(£) 

collapse V = if ^a : V a = s-ia" A {a") n ^(||V||) / 0 then V 

else let V' -3 {a s^a"} = V : {a**} n ^(|| V||) / 0 
in upd V' a (sna") 



Fig. 2. Collapsed-jungle representation of termsH 



In view of our semantic inference rules, collapsing a graph is highly beneficial: 
Rewriting a single node at a in a fully collapsed graph will effectively rewrite all 
subterms identical to the one that can be extracted from a. 

Theorem 20. The two implementations shown in Figs.^and\^are realisations. 

5 Transformation 

It seems intuitively right that the standard graph machinery (Fig.Q) induces 
very little administrative overhead, whereas the collapsed-jungle graph machin- 
ery (Fig.EJ can be burdensome. We therefore propose a feasible compromise: 



^ The (••• = •••)" is a shorthand for n equations. 
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Optimise programs at compile time using collapsed jungles, but use standard 
graph machinery at run time. As we will see, it is possible to achieve some 
of the advantages of collapsed-jungle reduction by a source-to-source program 
transformation. 

The program transformation we present here is a variant of supercompilation 
iTurchin 1101 irHI l. more specifically positive supercompilation (Sprensen, Gliick 
and Jones m modified to work on jungles instead of terms. The transforma- 
tion process is divided into two phases. First, a finite model of the program is 
constructed w.r.t. a term. Second, a new program is extracted from the model. 



5.1 Driving 

Gliick and Klimov jSj call a model of a program a process graph. The nodes in the 
graph are labelled by terms (i.e., program state), and each successor of a node 
represents a one-step unfolding. A branch in the process graph thus represents 
speculative execution of a particular term. Leaves in the process graph represent 
terms that are fully evaluated. For a particular program q, each full path in a 
(possibly cyclic) process graph for q represents a set of actual executions of q, 
such that the union of all full paths in the process graph includes all possible 
executions of q. 

To keep the process graph manageable, cycles are represented implicitly by 
leaves containing repetitions of previously seen terms; the graph thus simply 
becomes a tree. 

To construct the process tree, we need to drive the program, that is, specula- 
tively executing non-ground terms. For this purpose we present two modifications 
of the semantics of the language. The first simply allows variables in terms. 

Definition 21 (Deterministic unfolding, B). Let the set of constructor terms 
B be given by the grammar 6 ::= x | c 6" . The relation i — > C T x T is defined 
as — > in Def. d except that V is replaced by B in the const rule. □ 

The new relation i — > is still deterministic, but it allows each reduction step 
to ignore what corresponds to uninstantiated parts to the left of a redex. With 
the relation i — > we can reduce non-ground terms as long as we do not run into 
redices of the form gxt^. To reduce such redices, we need to speculatively try 
out all possible forms of values of x, according to the definition of g. 

Definition 22 (Speculative unfolding). The relation i=^ C T x T is defined 
as I — >, but with the additional rule 



(inst) 



(V oo) a'o a'o ^ ^(V) (/3. = fresh)™ 

V' = upd (V -I- {a 1-^- g-\ao a"}) Oq (ch/J™) 
((V -I- {a 1-^ g-fcto ct"}) ct) ^ (V' a) 



The relation i=^ is non-deterministic. When it encounters a stuck redex 
g X t", it “produces” a new DAG where g xf^ has been instantiated to g (c a;™) 
for each pattern ex™ defined by g. Each of these instantiated dags will allow 
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further reduction to take place, since each appropriate right-hand side of g now 
can be unfolded. 

It is now easy to see how we can create a process tree for a program q and 
an initial term t: First, pick some realisation I and label the root of the process 
tree by a DAG created from t. Then, repeatedly add new leaves to the process 
tree by using the relation i=^ to drive existing leaves. 



5.2 Generalisations 

Unfortunately, creating a process tree in the above manner hardly ever termi- 
nates, that is, the process tree will grow unboundedly. But, as shown by Sprensen 
& Gliick [I3j , a sufficient condition for ensuring that the construction of process 
trees terminates, is to impose a well- quasi- order on the labels in the process tree. 



Definition 23 (wqo). A well-quasi-order on a set 5” is a reflexive, transitive 
binary relation < such that, for any infinite sequence si S 2 ■ • ■ of elements from 
S, there are i,j G N such that i < j and Si < sj. □ 

Hence, if we ensure that, for all nodes n in the process tree, there never exists 
an ancestor a of n such that label(a) < label(n), then all branches in the process 
tree will be finite. Since the process tree is finitely branching, the process tree 
will be finite (by Konig’s Lemma). 

Sprensen & Gliick H3 used the homeomorphic- embedding relation on terms 
to detect when termination is endangered; we will use this relation in ensuring 
termination, so a repetition is in order. 

Definition 24. Let s £ £ , x,y £ X , and t^u G T . The homeomorphic- embedding 
relation < C T x T is the smallest relation satisfying the inference rules 

it. < U.)" t <Ui 

1 < i < n 

X <y □ 

Since the homeomorphic-embedding relation is a well-quasi-order, it can be used 
as an indicator for when to stop the development of the process tree, that is, when 
to stop driving. The question is then what to do, when we need to stop driving. 
As described by Turchin [19] . we need to generalise one of the offending nodes 
in the process tree, in effect throwing away information that has been acquired 
during driving. The solution in in is to split up such nodes into several parts 
that can be explored separately. Generalisations thus give rise to branches in the 
process tree (as do the speculative unfolding). 

5.3 Using DAGS 

Since we employ dags instead of terms, our generalisation operation needs to 
split up DAGS. In particular, we want an operation that divide a DAG into two 
autonomous parts that can be expressed as terms, in order to “reassemble” the 
state when the transformed program is extracted from the process graph. 
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Example 25. Consider again the DAG from Ex.|2| shown to the left below. 




Splitting up this DAG into two non-trivial dags can be done as indicated in the 
middle. To the right is shown how such a split can be represented in the term 
world, interpreting the roots of lower half of the DAG as a tuple of terms. □ 

Spitting up a DAG thus naturally gives rise to the notion of a root list of a DAG. 
The example should give enough intuition to support the following definitions. 

Definition 26 (Roots, ports, subdags, and proper splits). 

1. Every DAG V is implicitly accompanied a finite root list, roots(V) G A*. 

2. The ports of a DAG V is the set of addresses outside the domain of V that 
are reachable through the roots of V : 

ports(V) = {(i G A \ (3 ^ ^(y) A 3a S roots(V) : a —o P} . 

3. A DAG V' is a subdag of a DAG V, denoted V' < V, if V' C V and 

Va G ^(V) \ ^(V') : {a —o P implies P ^ {P \ a G roots(V') ^ /?}) . 

4. The pair (V' V") is a proper split o/ V if V = V' + V", V" < V, and 

V' 0 V". □ 

Informally, all nodes external to a subdag can only reach nodes in the subdag 
through the roots of the subdag. That is, if V' < V, then V' can be “carved” 
out of V, such that V = V' + V" and V" interacts with V' only through the 
roots of Vh A proper split is then a division of a DAG into two non-trivial parts. 
The split in Ex.|^is proper. 

We will, however, use such a split operation as a last resort. A more sophis- 
ticated generalisation can be achieved when the offending node n in the process 
tree has an ancestor a such that n is reducible to a. Informally, V is reducible 
to Vi, if the terms in V can be reconstructed by carving out a subdag V 3 (of 
V) and connecting it with Vi via a set of indirections V 2 . 

Definition 27 (terms, reducible). 

1. terms(V) = [(xtract (V a)) | a ^ roots(V)]. 

2. V is reducible to Vi by V 2 = {(a. 1 -^ /3.)”} and V 3 , if 

(a) ports(Vi) = {a"}, 

(b) roots(V 2 ) = [a”]. 



(c) roots(V3) = [(}\(i ^ ports(V2)], 

(d) V3 < V, and 

(e) terms(V) = terms(Vi + V 2 + V 3 ). 
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□ 



Example 28. Assume that the DAG Vi in the middle is an ancestor of the DAG 
V to the left. 




The state of V can be generalised into a function b by calling the function a, 
representing state Vi, by providing the arguments constructed by functions c 
and d, representing the indirections V 2 and subdag V 3 to the right. □ 

We have now established some means of generalising dags (accompanied with 
root lists), and alluded to how code can be generated from such generalisations. 
To ensure termination of transformation, it is crucial that every generalisation 
breaks down a DAG into strictly smaller components. 

Remark 29. For code-generation purposes, it is furthermore beneficial to strip 
every DAG for its outermost constructors and indirections by yet another gener- 
alisation step. For this presentation, however, such operation is not needed, and 
we therefore leave out the details. 

The point of reducing a DAG to an ancestor is almost obvious: Only the 
subdag that has been carved out needs to be driven further. This property calls 
for a definition of process trees and finished nodes. 

Definition 30 (Process trees, labels, leaves, ancestors, finished). A pro- 
cess tree r is a non-empty tree labelled with dags. For a particular node v in r, 
the label of v is denoted label(i^), and the set of ancestors is the set of proper pre- 
decessors of V in T, denoted anc(r, n). The leaves of r are denoted by leaves(r). 
A node in r is finished if one of the following holds. 

1. n leaves(r). 

2 . 3^ G anc(r, :^) : label(i^) = label(^). 

3. terms(label(i^)) G B* . 



A process tree is finished, if all nodes are finished. 



□ 
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That is, a node i/ in a process tree is finished if ly is an interior node, if z/ is a 
repetition, or if there are no function symbols left to drive in ly. 

It remains to define exactly when to generalisations are needed. The following 
quasi-order seems to be desirable. 

Definition 31. We say that V is embedded in V', denoted V V', if both 

Vt G terms(V) : 3u G terms(V') : t < u 
Vit G terms(V') : G terms(V) : t < u . 

We define embeddings V V' = 

{a G Si{V) I 3/3 G roots(V) : {xtract (V /3)) < (xtract (V' a))} . □ 

Conjecture 32. is a well-quasi-order. 

Remark 33. As of this writing, we have not been able establish a proof of the 
the above conjecture. If the conjecture is false, another suitable well-quasi-order 
needs to be invented. Leuschel UDI describes why well-quasi-orders are preferable 
over well-founded orders. 



input: program q, term t, and a realisation I 
output: the process tree r 
let oo = FRESH 

let tree r consist of a single node labelled {substi 0 oo and roots [oo] 

while T is unfinished do 

let ly be an unfinished node with V = label(i/) 

if G relanc(r, v) : label(/r) ^ V 

then let a G roots(V) : 3 V' : (V a) l=> V' 

ql 

add children to n with labels [V' | (V a) l=J>V'] 

q-I 

else let p G anc(r, : Vi = label(/r) A Vi V 
if V is reducible to Vi by V2 and V3 
then add three children to v with labels Vi, V2, and V3 
else if 3 {V2 V3) : {V2 V3) is a proper split of V and 

ports(V2) n (embeddings Vi V) / 0 
then add two children to u with labels V2 and V3 
else let {V2 V3) be a proper split of Vi 

replace all subtrees of jj, with two children labelled V 2 and V 3 



Fig. 3. The process-trees construction algorithm. 



An algorithm for developing process trees is depicted in Fig.0 assuming for 
the moment that relanc(r, z/) = anc(r, v). 



Driving in the Jungle 213 



It turns out, however, that scrutinising all ancestors is too conservative: too 
many generalisations happen. Firstly, when a DAG V is speculatively unfolded to 
a DAG V' by an instantiation step, it is always the case that V V'. Secondly, 
an instantiation step will give rise to a series of deterministic unfoldings (Turchin 
m calls these transient reductions). It is well known from partial evaluation 
and deforestation m that such deterministic unfoldings are very beneficial, in 
that they are invariants in the program q. 

We will therefore adapt a notion of relevant ancestors, as introduced in 
Sprensen & Gliick m- 

Definition 34 (relevant ancestors). Let v hy a node in a process tree r. 

1. n is generalised, if its the children have been added by a generalisation step. 

2. V is global, if its parent node is generalised, and/or if children have been 
produced by an instantiation step (i.e., n cannot not unfolded by i — > alone). 

3. is local, if it is neither generalised nor global (i.e., v have can be unfolded 
by I — >). 

4. The set of immediate local ancestors of v, locanc(r, z/), is the set of local 
nodes in the longest branch of local nodes /ii . . . in r such that /r„ is the 
parent of n. 

5. The set of relevant ancestors of v in t is defined as 

1 / X A I {h I G anc(r, n) A u is global} if z/ is global 

relanc(r, z^) = < 

I locanc(r, z/) if z/ is local □ 



Conjecture 35. The algorithm in Fig. 0 terminates for all programs. 

Informally, the restriction to relevant ancestors is safe by the following reasoning. 
There cannot be a branch with an infinite number of consecutive local nodes, 
since then there would be an embedding, resulting in a generalisation, thus 
creating a global node. Since every node only can be generalised once, breaking 
it into strictly smaller pieces, the process tree stabilises (as a Cauchy sequence). 
The proposed algorithm is thus an instance of what Sprensen calls an abstract 
program transformer irq. However, the above remains a conjecture, in the light 
of the missing proof of Conj.E21 

Theorem 36. The algorithm in Fig.^results in a program that is equivalent to 
the original program. 

The efficiency of the transformed program depends on the particular realisa- 
tion / used by the unfolding rules 0 We have not been able to establish proofs of 
the efficiency of the transformed program with respect to I, but it seems likely 
that both of the realisations in Figs.QIandElwill guarantee that the transformed 
program is at least as efficient as the original program. 

^ To some extent, the efficiency also depends on the treatment of sharing between the 
ontermost constructors; the prodnced code must carefully mimic such sharing. 
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Example 37. Let q be the Fibonacci Number program in Fig.Q let t = Rbx. If 
collapsed jungles are chosen as the underlying reduction strategy, the algorithm 
in Fig.0will produce the process tree depicted in Fig. 0 A program very similar 
to the following can be extracted from the process tree: 



data Nat 
data Pair x 
aO 

a (s x) 
bO 

b{sy) 
c {x y) 

do 



0 I sNat 
{x y) 

SO 

b X 

SO 

c{dy) 
e X y 

((SO) (SO)) 




d 


(S 


x) = 


f (d x) 


e 


0 y 


1 


y 


e 


(S: 


x)y = 


S{exy) 


f 


{x 


y) = 


gxy 


g 


Oy = 


(2/0) 


g 


(S 


x)y = 


h(i {gxy)) 


h 


{x 


y) = 


((S y) x) 


i 


{x 


y) = 


((S y) x) 



The tuples in this program stem from multiple roots. Observe that, in comparison 
to the original, the transformed program avoids making an exponential number 
of calls. □ 



6 Conclusion and Related Work 

The benefits of deforestation and supercompilation are well illustrated in the 
literature, and advances in ensuring termination of these (and similar) transfor- 
mations have greatly improved their potential as automatic, off-the-shelf opti- 
misation techniques. One problem, however, remains in making these techniques 
suitable for inclusion in the standard tool-box employed by compiler writers: 
It is in general not possible to ensure that a transformed program is at least 
as efficient as the original program, without imposing severe (usually syntactic) 
restrictions on the original programs. 

In this paper we have tried to formulate a version of positive supercompilation 
that addresses the concern of ensuring efficiency of the transformed program. The 
key ingredient in this formulation is the return to viewing terms as graphs. 

In the first part of this paper, we have shown that, for a small function- 
oriented programming language, any graph-reduction implementation obeying 
two reasonable rules will lead to the same semantics. We have given two ex- 
amples of such implementations, one similar to call-by-need, and one similar to 
collapsed-jungle evaluation. 

Wadsworth invented call-by-need for the pure A-calculus, and proved that 
normal-order (call-by-need) graph reduction is at least as efficient as normal- 
order term reduction for a certain subset of graphs representing A-terms, and 
he devised an algorithm for performing normal-order reduction. Hoffmann & 
Plump |7], the main source of inspiration for this research, have proved that term 
rewrite systems could be translated into hyper graph replacement systems. They 
define the notion of fully-collapsed jungles in terms of morphisms on graphs, and 
they show uniqueness of such fully-collapsed graphs. The cofiapse-function given 
in our second realisation of graph reduction is basically an implementation of 
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their /oW-morphism. Their main focus, however, is on showing that confluence 
and termination is preserved for a large class of term rewrite systems. 

In the second part of this paper, we have presented a version of supercom- 
pilation that — when using collapsed jungles as the underlying representation 
— can give some of the speedups that collapsed-jungle evaluation can give, but 
without any run-time overhead. In this respect, we have effectively achieved 
to perform tupling (Pettorossi ^21 Chin |^), an aggressive, semi-automatic 
program transformation based on the unfold/fold framework (Burstall & Dar- 
lington Q . The key ingredient in tupling is to discover a set of progressive euts 
HU in the call graph for a program, and automatic search procedures for such 
cuts have been investigated intensively. In particular, Pettorossi, Pietropoli & 
Proietti m manipulate dags in a fashion that is very similar to our notion of 
a proper split H It seems that we are able to synchronise common calls, because 
we use of a local/global unfolding strategy similar to what is used in partial 
deduction (see e.g., Gallagher ^ or De Schreye, et.al. Pj). 

Further work needs to be done in three directions. Firstly, we need to prove 
the efficiency and correctness properties conjectured in this paper. Secondly, we 
want to investigate the exact relationship between tupling and graph-based su- 
percompilation. Thirdly, to establish empirical results, an implementation of the 
presented transformer is under construction. In the future, we hope to bootstrap 
the transformer, in the sense of expressing it in terms of the subject language. 
Having done this, it will be possible to experiment with self-application (e.g., as 
described by Jones, Sestoft, and Spndergaard 0 or Turchin EDI). 
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Fig. 4. Process tree of the Fibonacci program. Global nodes are shaded. 







Driving in the Jungle 217 



[5] R. Gliick and A.V. Klimov. Occam’s razor in metacomputation: the notion of a 
perfect process tree. In P. Cousot, M. Falaschi, G. File, and G. Rauzy, editors, 
Workshop on Static Analysis, volume 724 of Lecture Notes in Computer Science, 
pages 112-123. Springer- Verlag, 1993. 

[6] Annegret Habel, Hans-Jorg Kreowski, and Detlef Plump. Jungle evaluation. Fun- 
damenta Informaticae, XV:37-60, 1991. 

[7] Berthold Hoffmann and Detlef Plump. Implementing term rewriting by jun- 
gle evaluation. RAIRO Theoretical Informatics and Applications, 25(5):445-472, 
1991. 

[8] N.D. Jones, P. Sestoft, and H. Spndergaard. Mix: a self-applicable partial evaluator 
for experiments in compiler generation. Lisp and Symbolic Computation, 2(1):9- 
50, 1989. 

[9] Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. Partial Evaluation and 
Automatic Program Ceneration. Prentice-Hall, 1993. 

[10] Michael Leuschel. On the power of homeomorphic embedding for online termi- 
nation. In Georgio Levi, editor. Static Analysis. Proceedings, volume 1503 of 
Lecture Notes in Computer Science, pages 230-245, Pisa, Italy, September 1998. 
Springer- Verlag. 

[11] A. Pettorossi, E. Pietropoli, and M. Proietti. The use of the tupling strategy 
in the development of parallel programs. In R. Paige, J. Reif, and R. Wachter, 
editors, Parallel Algorithm Derivation and Program Transformation, pages 111- 
151. Kluwer Academic Publishers, 1993. 

[12] Alberto Pettorossi. A powerful strategy for deriving efficient programs by trans- 
formation. In Conference Record of the 1984 ACM Symposium on Lisp and Func- 
tional Programming, pages 273-281. AGM, ACM, August 1984. 

[13] Gordon D. Plotkin. A structural approach to operational semantics. Technical 
Report DAIMI FN-19, Computer Science Department, Aarhus University, Aarhus, 
Denmark, September 1981. 

[14] M.H. Sprensen and R. Gliick. An algorithm of generalization in positive super- 
compilation. In J.W. Lloyd, editor, Logic Programming: Proceedings of the 1995 
International Symposium, pages 465-479. MIT Press, 1995. 

[15] M.H. Sprensen, R. Gliick, and N.D. Jones. A positive supercompiler. Journal of 
Functional Programming, 6(6):811-838, 1996. 

[16] Morten Heine Sprensen and Robert Gliick. Introduction to supercompilation. In 
John HatclifI, Torben Mogensen, and Peter Thiemann, editors, Partial Evaluation: 
Practice and Theory, volume 1706 of Lecture Notes in Computer Science, pages 
246-270. Springer- Verlag, 1999. 

[17] Morten Heine B. Sprensen. Convergence of program transformers in the metric 
space of trees. Science of Computer Programming, 37(l-3):163-205, May 2000. 

[18] V. F. Turchin. The algorithm of generalization in the supercompiler. In 
D. Bjprner, A. P. Ershov, and N. D. Jones, editors. Partial Evaluation and Mixed 
Computation, pages 531-549, Amsterdam: North-Holland, 1988. Elsevier Science 
Publishers B.V. 

[19] V.F. Turchin. The concept of a supercompiler. ACM Transactions on Program- 
ming Languages and Systems, 8(3):292-325, 1986. 

[20] V.F. Turchin and A.P. Nemytykh. A self-applicable supercompiler. Technical 
Report CSc. TR 95-010, City College of the City University of New York, 1995. 

[21] P.L. Wadler. Deforestation: Transforming programs to eliminate intermediate 
trees. Theoretical Computer Science, 73:231-248, 1990. 

[22] Christopher Peter Wadsworth. Semantics and pragmatics of the lambda calculus. 
Ph.D. thesis, Programming Research Group, Oxford University, September 1971. 




Higher-Order Pattern Matching for 
Automatically Applying Fusion Transformations 



Ganesh Sittampalam and Oege de Moor 

Programming Research Group, Oxford University Computing Laboratory, 
Wolfson Building, Parks Road, 0X1 3QD, United Kingdom 
{ganesh , oege}@comlab .ox.ac.uk 



Abstract. We give an algorithm for higher-order pattern matching in 
the context of automatic program transformation. In particular, we show 
how accumulating parameter optimisations of functional programs can 
be automatically derived with the aid of programmer annotations. These 
techniques have been successfully applied to some complex manual deriva- 
tions in the literature, such as Bird’s “longest path-sequence”. 



1 Background and Motivation 

Consider the following program for calculating the minimum depth of a leaf- 
labelled binary tree. The language we use is Haskell 0 but it could easily be 
rewritten in another functional language. 

data Tree a = Leaf a \ Bin ( Tree a) ( Tree a) 

mindepth {Leaf a;) = 0 

mindepth {Bin s t) = 1 -I- min {mindepth s) {mindepth t) 

The program is short, clearly stated, and it is easy to see that it will do what 
we intend. However, it is also not very efficient. Imagine a right-leaning tree; our 
program will quickly explore the left branch of the tree, but will then spend a 
significant amount of time in the right-hand branch despite the fact that it will 
quite soon become apparent that the leaf of minimum depth must be in the left 
branch and that the search of the right branch could be aborted. 

A little bit of thought allows an experienced programmer to make this defi- 
nition more efficient; simply add two accumulating parameters to keep track of 
the minimum depth found so far in the tree and the current depth, and then the 
search can be cut off for any particular branch if the current depth reaches the 
minimum already found. 

It should be noted that this the following is still not the fastest possible 
program in all situations, since it still carries out a depth-first search, but it does 
represent a significant improvement over the original program; this optimisation 
is also a simple representative of a larger class of optimisations on tree-consuming 
functions, such as the alpha-beta algorithm for fast searches of game trees 0. 
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md {Leaf x) d m = min d m 

md {Bin s t) d m = if nrf >= m then m 

else md s nd {md t nd m) 

where nd = d + 1 
mindepth t = md t 0 oo 

However, it is not nearly so obvious that the resulting program is correct, and 
it is certainly much harder to understand than the above code, which increases 
the likelihood of bugs and makes it harder to maintain. What we need is a 
programming paradigm that combines the best of both worlds - the clarity of 
the first program and the efficiency of the second. 

The key point to note here is that the first and second programs can be related 
by equational reasoning, and that in fact once we have made the insight that we 
should use “minimum depth found so far” and “current depth” as accumulating 
parameters, deriving the second program from the first is quite straightforward. 

Thus, it seems logical to take the approach of giving our program as the 
first piece of code, together with information on how to derive the more efficient 
second program. We will end up with source that is much easier to read and 
maintain, and with efficient object code that we can be confident is correct, as- 
suming we trust our compiler. In fact, we can take this idea a stage further and 
emit the second program as an intermediate result, thus enabling the program- 
mer to have confidence that the optimisation he or she intended has actually 
been applied. It is the development of an algorithm that will make such an 
approach possible for programs like mindepth that we will discuss in this paper. 



1.1 Fusion Transformations 

One feature of the mindepth program is that it is a recursive traversal of the 
inductive datatype Tree. This means that it can be expressed as an instance 
of a general function known as a fold, which encapsulates this pattern of pro- 
gramming. A separate fold function exists for each datatype; in this instance it 
is defined as follows: 

foldbtree n I {Leaf a) = I a 

foldbtree n I {Node s t) = n {foldbtree n I s) {foldbtree n I t) 

The advantage of expressing our program as a fold is that it allows us to make 
use of a general rule about folds known as fusion | 2 | . This rule essentially states 
that under certain conditions we can replace a fold followed by the application 
of another function with just a single fold. 

In the case of the example above we can specify the md function from the 
second program in terms of the original mindepth function we gave by the defi- 
nition 



md t d m = min {mindepth t -\- d) m 
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This specification shows us that md can be expressed as a function applied 
to the result of a fold (the fold being mindepth), and it turns out that the fusion 
rule applies and in fact mirrors the manual derivation required to get from the 
first program to the second program. Since it tends to be difficult to direct the 
progress of automatic derivations, the use of this rule as a general “template” 
allows us to concisely specify the path a transformation system should take. Once 
we have done the appropriate transformation on md, we can redefine mindepth 

by 



mindepth t = md t 0 oo 



For each datatype, a different fusion rule exists to go with the fold function; 
given the datatype, it is reasonably easy to derive the appropriate fusion rule 
automatically. In the case of the leaf-labelled binary trees described above, the 
appropriate rule is: 



/ {foldbtree nit) = foldbtree n' I' t 
if Va.l' a = f{la) 

ybc.n' (fb) (/c) = f{n be) 

Applying this rule automatically is an application of term rewriting] a process 
of applying equational rules such as the one above to a program. The first step 
is to look for instances of the left-hand side of the rule in the program we are 
manipulating. Once we find one, we know that we can replace it with the right- 
hand side if we can satisfy the side conditions given. 

We take the conditions one by one; first we transform both sides of each 
condition by repeatedly applying term rewriting. In practice the rules that are 
used here will turn out to be simple ones with no side conditions, so this is 
quite a straightforward process. Once no more rules apply, our next problem 
is to find instantiations for the unknowns that will make the condition true. 
It will always be the case that unknowns will only occur on the left-hand side 
of each condition, which makes this task somewhat simpler; however they are 
of function type, which means that we are forced to actually synthesise new 
function definitions. 

This procedure is best illustrated with an example. We have developed a 
system MAG 0 which carries out derivations of this nature, and the actual cal- 
culation for mindepth it carries out can be found in the appendix. The first point 
to note is that we use a trick of “seeding” the fusion to avoid the programmer 
having to give their original program in terms of a fold. Next, we examine in 
detail the application of the treefusion rule. This contains two sub-calculations 
that show the exhaustive rewriting of the right-hand sides of the above side con- 
ditions (the universal quantification is expressed as equality of A-abstractions) . 
The final result of the second of these calculations is 
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A & c rf e. if a >= e then e 

else min (mindepth b + a) (min {mindepth c -I- a) e) 

where a = 1 + d 

Now, instantiating / in Xbc.n' {f b) (/c), which is the left-hand side of the side 
condition in question, gives 

X b c . n' {X h i . min (mindepth b + h) i)(X j k . min (mindepth c + j) k) 

Thus, the substitution we want for n' is as follows; we will have to synthesise a 
rather complicated function body. 

n' := Xf g d e .ii a >= e then e 

else f a (g a e) 

where a = 1 + d 

This paper is concerned with an algorithm for doing this. In previous work 
0 we have given one such algorithm, known as the “one-step” algorithm; this 
algorithm represented an advance on the standard algorithm in the literature 
which was developed by Huet and Lang uni; for example it allowed certain 
programming examples such as the well known “fast reverse” optimisation to be 
derived. However, this algorithm proves to be inadequate for more complicated 
problems such as this one, and therefore we now extend the ideas presented there 
to produce the (predictably named) two-step algorithm, which has rather more 
limited applicability but turns out to be able to find the matches we need in 
those cases where the one-step algorithm is inadequate. 

Of course, we are not the only people to have worked on automatically ap- 
plying this and similar sorts of transformation. For example Chitil 0 shows how 
shortcut deforestation 0 can be carried out based on type inference, and Onoue 
et al m have implemented similar transformations in an optimising pass to ghc 
(the Glasgow Haskell Compiler); ghc also implements short-cut deforestation via 
programmer-specified rules. 

1.2 Higher-Order Matching 

Abstracting from the particular programming language being used, we state the 
problem as follows. Given A-expressions P (the pattern) and T (the term), find 
a substitution cj) such that 



(j)P = T 



Here equality is taken modulo renaming (a-conversion), elimination of redundant 
abstractions (ry-conversion), and substitution of arguments for parameters (/3- 
conversion). A substitution (j) that satisfies the above equation is said to be a 
match. Later on, we shall refine our definition of equality of A-expressions and 
the notion of a match. 

There is no canonical choice for (j). For example, let 



P = f X and T = 0 . 
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Possible choices for <f> include: 

/ := (Aa.a) and x := 0, 

/ := (Aa.O), 

/ := (Ag.gO) and x := (Aa.a), 
f ■= {^9-9 (5O)) and x := (Aa.a), 



All these matches are incomparable in the sense that they are not substitution 
instances of each other. 

It should be noted that matching differs from the more commonly known 
problem of unification in that free variables are allowed in only one of the A- 
expressions; this is an acceptable restriction because the side conditions we will 
need to satisfy will only have free variables on the left-hand side. 

2 Preliminaries and Specification 

We start by introducing some notation, and then pin down the matching problem 
that we intend to solve. Users of our algorithm (for instance those who wish to 
understand the operation of MAG) need to know only about this section of the 
paper. 



2.1 Expressions 

An expression is a constant, a variable, a A-abstraction or an application. There 
are two types of variables: bound (local) variables and free (pattern) variables. 
We shall write a,b,c for constants, x,y,z for local variables, p,q,r for pattern 
variables, and use capital identifiers for expressions. Furthermore, function ap- 
plications are written F E, and lambda abstractions are written Ax.E. As usual, 
application associates to the left, so that Ei E2 E3 = (Ei E2) E^. 

It is admittedly unattractive to make a notational distinction between local 
and pattern variables, but the alternatives (De Bruijn numbering or explicit 
environments) would unduly clutter the presentation. In the same vein, we shall 
ignore all problems involving renaming and variable capture, implicitly assuming 
that identifiers are chosen to be fresh, or that they are renamed as needed. 
Equality (=) is modulo renaming of bound variables (a -conversion). 

Besides renaming, we also consider equality modulo the elimination of su- 
perfluous arguments. The rj-conversion rule states that (Ax.E x) can be written 
as E, provided x is not free in E. An expression of this form is known as an 
rj-redex; we shall write Ei ~ E2 to indicate that E\ and E2 can be converted 
into each other by repeated application of vy-conversion and renaming. 

The ^-conversion rule states how arguments are substituted for parameters: 
(Ax. El) E2 is converted to (x := E2)Ei. A subexpression of this form is known 
as a (3-redex. The application of this rule in a left-to-right direction is known 
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as (3-reduction. Unlike ? 7 -reduction, repeated application of /3-reduction is not 
guaranteed to terminate. 

An expression is said to be normal if it does not contain any ? 7 -redex or (3- 
redex as a subexpression. An expression is closed if all the variables it contains 
are bound by an enclosing A-abstraction. 

Some readers may find it surprising that we have chosen to work with un- 
typed A-expressions, instead of committing ourselves to a particular type system. 
Our response is that types could be represented explicitly in expressions (as in 
Girard’s second-order A-calculus, which forms the core language of ghc m)- Our 
algorithm can be adapted accordingly to expressions in which types are explicit 
in the syntax. However, as with the unification algorithm presented in 
does not depend on a particular typing discipline for its correctness. 



2.2 Substitutions 

A substitution is a total function mapping pattern variables to expressions. Sub- 
stitutions are denoted by Greek identifiers. We shall often specify a substitution 
by listing those assignments to variables that are not the identity. Substitutions 
are applied to expressions in the obvious manner. Gomposition of substitutions 
(f> o Ip is defined by first applying ip and then cf>. 

We say that one substitution <p is more general than another substitution ip 
if there exists a third substitution 6 such that ip = S o p-, we also write p < ip. 
Intuitively, when p < ip, the larger substitution ip substitutes for variables that 
p leaves alone, or it makes more specific substitutions for the same variables. 

A substitution is said to be normal if all expressions in its range are normal, 
and closed if any variables that it changes are mapped to closed expressions. 



2.3 Rules 

A rule is a pair of expressions, written (U — > T), where P does not contain any 
? 7 -redexes, and T is normal, with all variables in T being local variables, i.e. they 
occur under an enclosing A-abstraction. The matching process starts off with T 
closed, but because it proceeds by structural recursion it can generate new rules 
which do not have T closed. In such a rule, a variable is still regarded as being 
local if it occurred under an enclosing A-abstraction in the original rule. We call 
P the pattern and T the term of the rule. Rules are denoted by variables X, Y 
and Z] sets of rules are denoted by Xs, Ys and Zs. 

The pattern P is also restricted as follows. For each subexpression F E of 
P such that F is flexible, i.e. has a free variable as its head, E must satisfy 
the following conditions. Firstly it must contain no pattern variables. Secondly, 
suppose E = \xi...Xn.B, where n is possibly 0, but B does not contain any 
outermost As. Then each of X\...Xn must occur at least once in B, and B must 
contain at least one constant symbol, or alternatively a local variable that is 
bound in P outside EE. So for example, p{Xx.x x) is a valid pattern, but 
p(Ax.x) is not, and neither is p(Xx.O). 
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This last restriction in particular will seem rather arbitrary, but it is this that 
will allow us to produce a workable algorithm for solving rules. In cases where 
this restriction prevents the algorithm from being used, the one-step matching 
algorithm that we have previously developed jHJ can be applied instead, and so 
it is not in fact a bar to carrying out the derivations we are interested in, such 
as that for mindepth. Others dlSl have introduced even more severe restric- 
tions on patterns for the purposes of developing useful matching or unification 
algorithms. 

The application of a substitution to a rule is defined by a{P ^ T) = aP T 
(since T is closed there is no point in applying a substitution to it). The obvious 
extension of this definition to a set of rules applies. 

A substitution (j) is said to be pertinent to a rule (P ^ T) if all variables it 
changes are contained in P. Similarly, a substitution is pertinent to a set of rules 
if all variables it changes are contained in the pattern of one of the rules. 



2.4 Restricting / 3 -Reduction 

Thus far, we have established a definition of equality (~) that takes account 
of a- and yy-conversion, but gives no regard to /3-reduction. The reason for this 
is that /3-reduction makes quite complex equivalences possible; for example the 
expression (Xxy.x + y) {{Xz.z) 0) 0 is /3-equivalent to 0 -I- 0. 

Given a rule {P ^ T)^ we know that T is normal. Thus, if we are looking for 
a substitution (j) that makes P and T equivalent, it would make sense to specify 
this by defining a function betanormalise that exhaustively applies /3-reduction, 
and then stating that we want all (j) such that betanormalise (</>P) — T. 

Unfortunately, finding all such substitutions ^ is a hard problem; in fact 
it is not even known whether it is possible to decide whether any exist or not. 
Therefore, our approach is to choose a restriction to this specification that makes 
the problem tractable but still produces the results we need for our particular 
application. 

The form of the specification above also gives us a clue as to how to go about 
restricting it. One way we could write the betanormalise function described 
above would be: 



betanormalise e 
betanormalise x 
betanormalise p 
betanormalise {Xx . E) 
betanormalise (Pi P2) 



c 

X 

P 

X X . {betanormalise E) 

case E[ of 

{Xx.B) — > betanormalise {{x -.= E'2) B) 
- (P( P') 

where P( = betanormalise E\ 

E'2 = betanormalise P2 



Essentially betanormalise conducts a tree walk, reducing /3-redexes as it finds 
them. The key point to note is that because reducing one can cause others to 
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become visible, it is necessary to recursively call betanormalise on the result of 
doing this. When trying to develop a matching algorithm, we find that it is this 
fact in particular that makes the problem very hard. Thus, we are led to the 
following definition (we omit the obvious cases that simply recurse through the 
tree in the same way as betanormalise): 

twostep {El E2) = case E[ of 

{Xx.B) —>■ unmark {markedstep {{x := mark E'2) B)) 
- {E'l E',) 

where E[ = twostep E\ 

E'2 = twostep E2 

Here we have written a function with a similar definition to betanormalise, but 
we have replaced the recursive call that occurs after reducing a /3-redex with 
something slightly different; the idea is that we will make precisely two passes 
over any particular /3-redex. Thus, the combination of the functions mark and 
markedstep is designed to apply a substitution and only reduce redexes that 
result from the application of the substitution. We use mark to label all the 
outer lambda abstractions in an expression: 

mark {X x . E) = X' x . {mark E) 

mark E = E 

The function markedstep is another function that follows the same pattern as 
betanormalise; in this case it only reduces redexes whose left-hand side is a 
marked lambda, and then does nothing more to the result. Finally the function 
unmark removes any remaining marks; its definition is obvious. As before we 
leave out the obvious cases from the definition of markedstep' . 

markedstep {Ei E2) = case E[ of 

{X'x.B) {x:=E'2)B 

- {E'l E'2) 

where = markedstep Ei 
E'2 = markedstep E2 

What we are aiming at with this (admittedly rather complicated) function 
twostep is something that will reduce a certain class of expressions to their 
normal form. Very informally, the expressions we want this to be true for are 
those which contain subexpressions that are lambda abstractions applied to other 
lambda abstractions, i.e. those of similar form to {Xy.yO) {Xx.x + x). 

Of course this class will also contain simpler expressions, for example those 
with subexpressions of the form {Xx.x + x) 0. However, it does not contain those 
of the form {Xz.z {Xx.x + a:)) {Xy.yO); that is those which would require three 
(or more) passes over the /3-redex to reduce it to normal form. 

Readers familiar with the theory of the A-calculus will recognise a strong sim- 
ilarity between our definition and that of finite developments which partially 
reduce a term by first underlining all /3-redexes in the original term and then 
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reducing all underlined redexes. However, two step is subtly different; consider it 
as a two-pass function where /3-redexes are first reduced and the results are then 
passed to markedstep for the second pass. Then whereas a single complete finite 
development would only reduce the outermost redex in (Ax y z.x + y + 
twostep would reduce this expression completely in the first pass. 

The more common way to restrict the matching problem is to limit the order 
of the terms in the images of the substitutions returned; a first-order term is a 
ground term such as 0, a second-order term is a function which takes first-order 
terms as parameters, and so on. However, this approach does not sit well with our 
particular application; the standard Huet and Lang algorithm m returns only 
second-order matches which are not enough to solve problems such as mindepth, 
and although a general algorithm for finding third-order matches exists jOj, the 
set of these can be infinite, and a matching algorithm that is not guaranteed to 
terminate is not useful in a practical program transformation system. 

Another restriction that has been explored in [nrnj is that of higher-order 
patterns. These impose a rather more stringent restriction than ours, namely that 
any free variable must only have distinct bound variables as arguments. This re- 
striction makes unification (and thus matching) decidable, and since terms of 
this form appear frequently in theorem-proving situations it is an appropriate 
restriction to impose in such environments. However, the matching problems 
generated by the transformations we are interested in do not satisfy this restric- 
tion. 

2.5 Two-Step Matches 

Having carefully designed our restriction of /3-reduction, we can now specify 
exactly what results our algorithm will produce. 

A rule (P — > T) is satisfied by a normal substitution (j) if 

twostep{(j) P) z±T . 

The substitution (f is then said to be a two-step match. Note that we take equality 
not only modulo renaming, but also modulo ry-conversion. A normal substitution 
satisfies a set of rules if it satisfies all elements of that set. We write </> h A to 
indicate that (f satisfies a rule X, and also </> h As to indicate that </> satisfies a 
set of rules As. 

The notion of a two-step match contrasts with that of a general match be- 
cause of our restriction of the notion of equality; a normal substitution cj) is said 
to be a general match if betanormalise P) ~ T. In |^, we defined one-step 
matches to be those (f that satisfy step{4>P) ~ T, where step is defined (in a 
similar way to twostep etc above) by 

step {El E 2 ) = case E[ of 

{Xx.B) {x:=E'^)B 
^ E'2) 

where E{ = step E\ 

E '2 = step E 2 

For convenience we shall refer to a two-step match simply as a match. 
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2.6 Match Sets 

Let Xs be a set of rules. A match set of Xs is a set Xi of normal substitutions 
such that: 

— For all normal cf)-. 4 >\- Xs if and only if there exists ip G M. such that ip < <p. 

— For all <pi,(p2 G M: li (pi < <p2, then <pi = (p2- 

The first condition is a soundness and completeness property. The backwards 
direction is soundness; it says that all substitutions in a match set satisfy the 
rules. The forwards implication is completeness; it says that every match is 
represented. The second condition states that there are no redundant elements 
in a match set. 

For example, if Xs = {p (Ax. a; -I- x) ^ 0 -I- 0}, then 

{{p:= (Xy.O + O)}, 

{p:= (Xy.yO)} } 

is a match set. But if Xs = {p {Xy.y 0) — > 0 -I- 0}, then since 

betanormalise {Xz.z (Ax.x -I- x) (Xy.y 0)) = 0 -I- 0 

we have that 

{p := {Xz.z (Ax.x -I- x))} 
is a general match, but because 

twostep {Xz.z (Ax.x -I- x) {Xy.y 0)) = (Ax.x -I- x) 0 
it is not a member of the match set. 

In general, match sets are unique up to pattern variable renaming, and con- 
sequently we shall speak of the match set of a set of rules. In the remainder of 
this paper, we present an algorithm that computes match sets; we shall omit the 
proof that this algorithm is correct, but will sketch a proof that that match sets 
will include all third-order general matches. 

First, we show that twostep is equivalent to full /3-reduction for all terms 
of third-order or below. Then, since general matches satisfy the specification 
betanormalise {(p P) ~ T, all third-order general matches must also satisfy 
twostep {(p P) ~ T and will thus be two-step matches as well. 

Showing the equivalence of twostep and /3-reduction involves tracking /3- 
redexes from the parameter to the result of twostep. Suppose that in the redex 
(Ax. 33) T, Ax. 33 is of order n. Then T must be of order at most n — 1, and thus 
any new redexes created by reducing (Ax. 33) T to (x := T) 33 must be of order 
at most n — 1 too. 

Now, if 33 is a term, then in calculating twostep E any /3-redex will be reduced 
once and then any resulting redexes will be reduced again by markedstep. Thus, 
if a redex of order n exists in twostep E, one of order n -I- 2 or higher must exist 
in E. Since first-order /3-redexes are not possible, it must be the case that all 
redexes of third-order or below are completely removed by twostep, and hence 
it is equivalent to full 6eta-reduction for terms of third-order or below. This 
completes our proof. 
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3 Outline of an Algorithm 

Our matching algorithm operates by progressively breaking down a set of rules 
until there are none left to solve. We outline its structure, then give the details 
of the function that will implement the key part of our algorithm. 

The function matches takes a set of rules and returns a match set. It is 
defined recursively (using the notation of Haskell ||): 

matches :: [Rule] [S’Mfest] 

matches [] = [idSubst] 

matches {X : Xs) = \{(j) o a) \ {a, Ys) <— resolve X, 

(j) <— matches {a {Ys -H- -^s)))] 

That is, the empty set of rules has the singleton set containing the identity 
substitution as a match set. For a non-empty set of rules {X : Xs), we take 
the first rule X and break it down into a (possibly empty) set of smaller rules 
Ys together with a substitution a which makes Ys equivalent to X. We then 
combine the Ys with Xs, the remainder of the original rules, apply a, and return 
the results of a recursive call to matches combined with a. 

Clearly it would be advantageous to arrange the rules in such a manner 
that we first consider rules where resolve X is small, perhaps only a singleton. 
There is no particular reason why we should take the union of Ys and Xs by list 
concatenation: we could place ‘cheap’ rules at the front, and ‘expensive’ rules at 
the back. 

The function that breaks up X into smaller rules is called resolve. Readers 
who are familiar with the logic programming paradigm will recognise it as being 
analogous to the concept of “resolution”. We specify the behaviour of resolve 
through certain properties; let 



[(tTo, b"so), (cti, Fsi), . . . , (ct/c, Ysk)] = resolveX . 



We require that 

— For all normal substitutions (j): 

{4> \- X) = \J {4> \- Ysi Aai < (j)) . 

i 

— For all normal substitutions (j) and indices i and j: 

{4> F Ysi) A (</) h Ysj) ^ i = j ■ 

— For each index i, Ui is pertinent to X, closed and normal. 

— The pattern variables in Ysi are contained in the pattern variables of X. 

— For each index i: 



Ys,<^X . 
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The first of these is a soundness and completeness condition: it says that all 
relevant matches can be reached via resolve, and that resolve stays true to 
the original set of rules. The second condition states that resolve should not 
return any superfluous results. The third and fourth conditions are technical 
requirements we need to prove the non-redundancy of matches. Finally, the last 
condition states that we make progress by applying resolve; i.e. that the process 
of breaking down the set of rules will eventually terminate. 



3.1 Defining Resolve 

Defining resolve is done on a case-by-case basis depending on the syntactic 
structure of the argument rule; the individual cases are summarised in the table 
below, the intention being that the first applicable clause is used. The reader 
is reminded of the notational distinction we make between variables: x and y 
represent local variables, a and b constants, and p a pattern variable. 



X 


resolve X 


x^y 


[{id, [])], if x = y 
[ ] , otherwise 


p^T 


[{p := T, [])], if T is closed 
[ ] , otherwise 


a ^ b 


[{id, [])], if a=b 
[ ] , otherwise 


(Xx.P) (Xx.T) 


[{zd, [P-T])] 


{Xx.P) T 


[{id, [P-(Tx)])] 


{F E)^T,F flexible 


[{id, [(P ^ etaRed{Xx.B))]) x fresh, 
B ^ abstracts X E T] 


{F E) (Ti T2), F not flexible 


m [{f^t,,e^t2)])] 


P^T 


[[ 



Let us now examine each of these clauses in turn. 

The first clause says that two local variables match only if they are equal. 

The second clause says that we can solve a rule {p T) where the pattern 
is a pattern variable by making an appropriate substitution. Such a substitution 
can only be made, however, if T does not contain any local variables occurring 
without their enclosing A: since the original term cannot contain any pattern 
variables, any variables in T must have been bound in the original term and so 
the substitution would move these variables out of scope. 

The third clause deals with matching of constants a and b. These only match 
when they are equal. 

Next, we consider matching of A-abstractions (Xx.P) and (Xx.T). Here it 
is assumed that the clauses are applied modulo renaming, so that the bound 
variable on both sides is the same, namely x. To match the A-abstractions is to 
match their bodies. 
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Recall, however, that we took equality in the definition of matching not only 
modulo renaming, but also modulo ry-conversion. We therefore have to cater for 
the possibility that the pattern contains a A-abstraction, but the term (which was 
assumed to be normal) does not. This is the purpose of the clause for matching 
(Xx.P) against a term T that is not an abstraction: we simply expand T to 
{Xx.T x) and then proceed as with the previous clause. 

The next two clauses deal with the cases where the pattern is an application. 
Given a rule F E —> T, the aim is to break it down into new rules by finding pairs 
of terms against which to match F and E respectively. The second of the two 
clauses is simple; it deals with the case where the function part of the pattern is 
not flexible, i.e. it does not have a free variable at its head and therefore there 
is no point in trying to construct a new function to match against it; if the 
term is also an application we simply match up the functions and arguments 
respectively, and if not we do nothing. 

The difficult case arises when the function part of the pattern is flexible; it 
is at this stage that the novel part of our algorithm comes into play, and our 
treatment of this case is described in detail in the following section. In essence, 
the abstracts function uses structure of E and T to “guess” at possible bodies 
for a function against which F could be matched; an enclosing A is then added, 
and if a top-level ? 7 -redex exists it is stripped off by the function etaRed. 

The final rule states that if none of the above rules were applicable then no 
matches exist. 

3.2 Abstracting New Function Bodies 

The abstracts function is used when resolving a rule of the form (EE — > T), 
where F is flexible and E satisfies the restrictions described in section 12., 3 1 As 
we remarked earlier, the overall goal when breaking down this rule is to generate 
new rules by finding pairs of terms against which to match F and E respectively. 
Since E does not contain pattern variables, the only term it can match is E itself, 
and thus the problem is to find valid terms to match F against. 

In other words, given the rule (F E T), we would like to generate a new 
rule {F T') which is satisfied by precisely the same match set. 

Now, consider the twostep function. Suppose that (j) \- (F E —>■ T); then 
twostep {4> {F E)) ~ T, and so twostep {{(j) F) E) ~ T, since E has no pattern 
variables. Next, assume that twostep {(j) F) is a A-abstraction, of the form Xx.B] 
although this is a technically incorrect simplification, it turns out that this is 
correctly balanced by the use of etaRed described above. 

Then the definition of twostep tells us that unmark (markedstep {{x := 
mark E) B)) = T; thus for each B that satisfies this equation, cj) \- {F E ^ T) 
will imply (f> \~ F Xx.B; some rather more rigorous reasoning than the above 
shows that if we find all such B, the appropriate reverse implication will hold, 
satisfying the completeness of resolve. 

Thus, the goal of the function abstracts is to find values for B. It takes as 
parameters a variable x, the expression E and the term T, which have the roles 
as outlined above. Clearly one possible value for B is T itself; but consider the 
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changes unmark {markedstep {{x := mark E) B)) makes to the expression B: 
it replaces all occurrences of x in i? with E and then performs one /3-reduction 
pass over the newly created occurrences of E and their arguments. Thus, when 
finding values for B, we should look for subexpressions in T that are the result of 
performing a /3-reduction pass over E applied to a set of arguments; we can then 
(selectively) replace these subexpressions with the variable x applied to this set 
of arguments. We call such subexpressions instances of E in T, and the process 
of replacing them with the variable x applied to appropriate arguments is known 
as abstracting. 

For example, if T is 1 -I- (0 -I- 0) and E is \y.y + y, then the subexpression 
0 -I- 0 is the only instance of E in T, and can be replaced with x 0. Thus the 
result of abstracts X ET will be {1-1- (0 -I- 0), 1 -I- x 0}. 

This procedure is somewhat complicated by the fact that instances may over- 
lap; for example consider T = (1-1-1) -|-(1-|-1) and E = Xy.y + y; then both the 
entire term T and the two occurrences of (1-1-1) are instances of E in T. It is 
for this reason that our algorithm searches for instances by an iterative process, 
which we shall now describe. 

The function abstracts is the union of the results of applying the function 
abstractsn to the same arguments for all n. The contents of abstracts„ x ET 
will be all the possible ways in which precisely n instances of if in T can be 
abstracted. 



abstracts xET = U{abstractsn x E T \ n = 0...} 

Clearly, abstracting 0 instances will just give us T, so abstractso just returns 
|T}. If we know all expressions in which n instances have been abstracted, then 
we can generate all those in which n+ 1 instances by simply abstracting one more 
instance in all possible ways from each. Since the body of E must contain at least 
one constant or local variable bound outside E, and since all parameters to E 
must appear in the body of E at least once, each time an instance is abstracted 
it must reduce the number of occurrences of this constant or local variable by 
one. Thus eventually no more instances of E will remain, and there will be some 
n for which abstractSn x ET will be empty, at which point we can terminate the 
iteration. 

abstractso X E T = |T} 

abstracts (^n+i) X E T = {C \ B G abstractSn x E T,C G abstract xEB} 

We use the function abstract to carry out a single iteration of this procedure; 
for each subexpression of the current result T it searches all subexpressions of T 
to check if they are instances of E; if so, it replaces them with x applied to the 
appropriate arguments. For example, a&stract X (Ay. y-|-t/) (l-l-(O-l-O)) = {l-|-x0}. 

Note that only subexpressions that do not already contain the variable x 
are chosen; this is because we want to guarantee not to change the number of 
occurrences of x already in T (to fit in with the goal of abstractSn) ■ Doing this 
means that outermost instances are always abstracted first. 
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abstract X E T = {replace loc RT \ (S,loc) G subexpsT,x ^ freevars(S), 

R G instance x E S} 

A subexpression is described as a term together with a sequence representing its 
position in the term it was taken from (its location) . We omit definitions of 

subexps :: Exp {{Exp , Location)} 
replace :: Location Exp Exp Exp 

These functions respectively take a term and returns all its subexpressions 
and splice a sub-expression into an expression at a given location. 

The instance function checks to see if S' is a instance of E, and if so returns 
the appropriate expression involving x with which to replace S. 

instancex E S = {x {4>xi) ... {(pXn) \ {xi, ...,Xn) = args E, 

(j) G simplematch {body E) S} 

We omit definitions of args and body, which simply express E in the form 
Xx\...Xn.B where B is not a A-abstraction and n may be equal to 0, and re- 
turn {xi,...,Xn) and B respectively. 

The function .simplematch is specified as follows. It is simple to implement; 
it performs pattern matching by a straightforward recursive comparison of the 
structure of the pattern and the term. The variables x\,...,Xn are treated as 
pattern rather than local variables in the application of simplematch. 

4> G .simplematch P T = cj)P = T A y-ip.-ipP = T,(j)<'ip 

Note that simplematch P T will always contain 0 or 1 elements. 

4 Implementation 

As we remarked earlier, we have flexibility as to the order in which we apply 
resolve to rules, and thus it makes sense to delay those that will be slow for 
as long as possible. The most expensive part of this algorithm by far is the 
function abstracts, and so a good implementation will only process rules where 
the pattern is an application with a flexible head when no others remain. 

The implementation of abstracts itself can be improved from our somewhat 
abstract description. We have used sets throughout to represent lists of results; 
a concrete program would have to implement these somehow and make certain 
to remove duplicates where appropriate. 

In fact, our restrictions on the pattern ensure that for rules of the form 
F E ^ T with F flexible, we have that the body of E must contain a constant 
or externally bound local variable. This means that we can always identify all 
the possible instances of A in T immediately (since there can be at most one 
per occurrence in T of the constant or local variable from E). Therefore, instead 
of going through the iterative procedure given earlier which searches for and 
abstracts instances one at a time, we can first find all the possible instances 
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and then carefully work out which combinations of these will produce a valid 
result before abstracting each combination to produce a set of results; special 
care needs to be taken to handle the problem of overlapping instances described 
earlier. This approach also has the advantage that it will not produce duplicate 
expressions, and so there is no need to add an explicit check for them. 

5 Discussion 

The use of higher-order matching means that many quite complex program trans- 
formations can be expressed as easy to understand rewrite rules. Apart from our 
own system MAG |7], systems such as KORSO im make use of it, while Ergo 
uni uses higher-order unification. However, there do also exist many successful 
transformation systems that avoid it completely, for example Kids ini- The two 
standard objections to its use relate to efficiency (even second-order matching 
is known to be NP-hard PS|), and the need to impose a specific typing disci- 
pline. Our algorithm operates on untyped terms, which eliminates the second 
issue; the first is significant but thus far we have been able to obtain adequate 
performance for our needs with an implementation that still has plenty of room 
for improvement. 

When we first produced MAG we gave a number of examples of its use. At this 
time we only had the one-step matching algorithm available and we were thus 
forced to manually specialise the fusion rules for the more complicated examples 
so that the derivations could be carried out. With this algorithm we have removed 
the need to do this in all of the accumulating parameter optimisations described 
there, and indeed have successfully applied it to yet more complicated problems 
such as the longest path-sequence problem 

Other examples of optimisations MAG is able to apply include cat-elimination 
in various contexts such as “fast reverse” and the post-order traversal of a tree. 
Various tree traversals are also susceptible to this form of transformation; in 
addition to the “minimum depth” example described earlier, fast versions of 
programs to calculate the breadth-first traversal and to label nodes of a tree 
with their depth can be derived. A rather more complicated tree processing al- 
gorithm, a/3-pruning can also be calculated from the inefficient specification. 

We have not yet found an example of such a derivation {i.e. one that in- 
volves adding accumulating parameters to a recursive traversal of an inductive 
datatype) that MAG cannot handle using a standard fusion rule and a com- 
bination of the one-step and two-step matching algorithms. We would however 
expect that significantly more complex problems would be more difficult to han- 
dle because of the rather primitive exhaustive rewriting process used in the side 
calculations. We also anticipate some difficulties with complex problems involv- 
ing mutually recursive datatypes, because fusion rules on these types have some 
side conditions which give rise to patterns that violate our restriction, but may 
in some cases require results not found by the one-step algorithm. 

Of course, adding accumulating parameters is just one kind of optimising 
transformation. Another major class of transformations that we do not consider 
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here is tupling, i.e. the maintenance of additional useful values in the result of 
the function in question rather than in the arguments. A simple example of 
this is the program to calculate the Fibonacci numbers; written the completely 
naive way this will take exponential time to run, but if we keep track of both 
the current result and the previous result, this becomes linear time. However, in 
order to carry out the required derivations automatically our matching algorithm 
needs to be able to synthesise functions that take tuples as arguments, which 
the two-step algorithm cannot do. This is an issue we hope to address in future 
work. 

The naming of our algorithms naturally gives rise to the question “what 
about n-step matching?”. While it should in theory be possible to generalise 
the step and twostep functions appropriately, there are two reasons why this is 
unlikely to be worthwhile. Firstly, and most importantly, we know of no practical 
use for an algorithm that returns more results than two-step matching. Secondly, 
there was a significant jump in complexity between the one-step and two-step 
algorithms; extending our algorithm further and in particular finding a suitable 
set of restrictions on the pattern to make the set of results finite would most 
likely be very difficult. 

It should be noted that our approach to automatic program transformation 
does not decrease the amount of insight required of a programmer; it is still 
incumbent on him or her to spot the opportunity for an optimisation and to 
specify the appropriate rewrite rules to allow this optimisation to be applied. 
Also, if invalid rewrite rules are chosen, the result will be an incorrect program. 
Our system removes much of the drudgery associated with the derivation, and 
allows the insights to be recorded in their original form. In addition, the source- 
to-source nature of our transformations means that an iterative approach to 
finding the appropriate set of rewrite rules is possible; the programmer can 
make a first guess at the correct set, look at the details of the calculation this 
produces, and modify the set appropriately before trying again. 
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Appendix 

Minimum Depth Derivation 

We refer throughout to the automatic derivation of the efficient program for cal- 
culating the minimum depth of a binary tree. Here we show how this derivation 
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was done with MAGia, a transformation system within which we have imple- 
mented our algorithm. 

The input to the system is the following file, which specifies the rewrite 
rules we want to use. Of particular interest are the first, which specifies the 
optimisation we want applied, and the last, which is the fusion rule on binary 
trees. All the other rules contain the facts about the functions involved that will 
be needed to carry out the necessary side calculations when applying the fusion 
rule. 

To specify the optimisation, we define the function md that takes the accu- 
mulating parameters we want in terms of the inefficient mindepth; in addition 
we “tell” MAG that we want to carry out tree fusion on the variable t by re- 
placing it with a tree fold that is equivalent to the identity function; this fold 
acts as a “seed” to the fusion. This means that it is not necessary to provide the 
mindepth program already expressed as a fold. 

The fusion rule has side conditions which are universally quantified. Since 
functions are equal if and only if they are equal for all values of their arguments, 
we can express this by converting the universal quantifiers into A-abstractions on 
each side of the equation. Note that we could make MAG automatically derive 
the appropriate fusion rule from the datatype definition, which would reduce the 
user effort required to specify the transformation. 

We have omitted a second input file which is required to provide type infor- 
mation to MAG. 

md: md t d m = min (mindepth (foldbtree Bin Leaf t) + d) m; 

plusunit : 0+a = a; 

plusassoc: (a+b)+c = a+(b+c); 

minassoc: min (min a b) c = min a (min b c) ; 

cutmin: min (min mq mr + s) c 

if s>=c 
then c 

else min (min (mq +s) (mr+s)) c; 
mindepthO : mindepth (Leaf a) = 0; 

mindepthl: mindepth (Bin x y) = min (mindepth x) (mindepth y) + 1; 

treefusion: h (foldbtree plus f t) = foldbtree times g t, 
if { \b -> h (f b) = \b -> g b; 

\x y -> h (plus X y) = \x y -> times (h x) (h y) } 

The following derivation is produced by asking MAG to exhaustively rewrite the 
expression md. It should be noted that the final result obtained is expressed in 
terms of foldbtree; it would be a relatively simple matter to inline the definition 
of this function and thus produce a rather more readable program with the 
recursion made explicit. 
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md 

{ md } 

(\ a b -> min (mindepth (foldbtree Bin Leaf a) + b)) 

{ treefusion 

(\ b c -> min (mindepth (Leaf b) + c)) 

= { mindepthO } 

(\ b c -> min (0 + c)) 

= { plusunit } 

(\ b -> min) 

(\ b c d -> min (mindepth (Bin be) + d)) 

= { mindepthl } 

(\ b c d -> min ((min (mindepth b) (mindepth c) +1) + d)) 
= { plusassoc } 

(\ b c d -> min (min (mindepth b) (mindepth c) + (1 + d))) 
= { cutmin } 

(\ b c d -> 

(\ e -> 

if a >= e then e else min (min (mindepth b + a) 

(mindepth c + a) ) e 

) 

where a = 1 + d 

) 

= { minassoc } 

(\ b c d -> 

(\ e -> 

if a >= e then e else min (mindepth b + a) 

(min (mindepth c + a) e) 

) 

where a = 1 + d 

) 

} 

foldbtree (\ d e f -> 

(\ g -> if a >= g then g else da (e a g)) 
where a = 1 + f 

) 

(\ h -> min) 




Dynamic Partial Evaluation 



Gregory T. Sullivan 



Artificial Intelligence Laboratory 
Massachusetts Institute of Technology 
gr egs@ai . mit . edu 



Abstract. Dynamic partial evaluation performs partial evaluation as a 
side effect of evaluation, with no previous static analysis required. A com- 
pletely dynamic version of partial evaluation is not merely of theoretical 
interest, but has practical applications, especially when applied to dy- 
namic, reflective programming languages. Computational reflection, and 
in particular the use of meta-object protocols (MOPs), provides a pow- 
erful abstraction mechanism, providing programmatic “hooks” into the 
interpreter semantics of the host programming language. Unfortunately, 
a runtime MOP defeats many optimizations based on static analysis (for 
example, the applicable methods at a call site may change over time, 
even for the same types of arguments). Dynamic partial evaluation al- 
lows us to apply partial evaluation techniques even in the context of a 
meta-object protocol. We have implemented dynamic partial evaluation 
as part of a Dynamic Virtual Machine intended to host dynamic, reflec- 
tive object-oriented languages. In this paper, we present an implemen- 
tation of dynamic partial evaluation for a simple language - a lambda 
calculus extended with dynamic typing, subtyping, generic functions and 
multiple dispatch. 



1 Introduction 

Our goal is the efficient implementation of dynamic, higher-order, reflective 
object-oriented languages. Language features that we must support include dy- 
namic typing, runtime method definition, first class types (with subtyping), first 
class (and higher order) functions, and reflection. 

The concept of a meta-object protocol (MOP) [KdHDlj subsumes many of the 
above-listed features. A MOP is based upon computational reflection |Ma,eS7] 
- giving a program access to its internal structure and behavior and allowing 
programmatic manipulation of that structure and future behavior. A MOP en- 
tails that the entities being returned by reflective operations and manipulated 
programmatically be first class - thus the need for first class functions, types, 
classes, etc. We want to at once support the power and abstraction of meta- 
object protocols while at the same time providing efficient execution, especially 
when MOP-related features are not being used. For example, if the MOP pro- 
vides programmer control over method dispatch, but the programmer has not 
used that feature, the implementation should not instrument every method call 



O. Danvy and A. Filinski (Eds.): PADO-II, LNCS 2053, pp. 238-ESSI 2001. 
(c) Springer-Verlag Berlin Heidelberg 2001 



Dynamic Partial Evaluation 239 



in order to support the unused abstraction. The research described in this pa- 
per is part of a project to implement a Dynamic Virtual Machine (DVM) - a 
VM well-suited to hosting dynamic, reflective languages. The small formal lan- 
guage /Idvm presented in this paper is a simplified version of the DVM’s native 
language, DVML. 

Partial evaluation is a general technique for specializing parts of a 

program with respect to known values. For example, if a function has multiple 
arguments, and that function is called frequently with the same values for some 
of the arguments, it may be worthwhile to create a specialized version of that 
function for those common argument values. Within the body of the specialized 
version of the function, we may be able to optimize away many computations that 
depend solely on the values of the arguments against which we are specializing. 

Experience has shown that many of the elements of a program exposed and 
made mutable by a MOP (e.g. the class structure, methods of a virtual function, 
method dispatch algorithm, etc.) in fact change only rarely at runtime. Thus 
we may guardedly treat these aspects of an application as constant and then 
apply partial evaluation techniques to remove computations that depend on 
these mostly-constant program properties. 



2 Related Work 

A number of researchers have been drawn to using partial evaluation techniques 
to eliminate overhead due to reflective operations. This is not surprising, as a 
recurrent theme in partial evaluation research has been the elimination of “inter- 
pretive overhead” (especially by using self application) , and reflective operations 
can be viewed as exposing an interpreter semantics to programs. 

Masuhara et al. fMM AY I1V1Y98| use partial evaluation to eliminate in- 
terpretive/reflective overhead in an object-oriented concurrent language. Par- 
tial evaluation is performed as part of compilation, and impressive results are 
recorded, with nearly all interpretive overhead removed in some cases. A limita- 
tion of their system is that all possible effects on meta-level functionality must 
be known at compile time. For example, if modification to the evaluation of vari- 
ables will be effected at runtime, which modifications must be known at compile 
time. 

In [HINOOj . Braux and Noye use partial evaluation techniques to eliminate 
reflection overhead in Java jd.lSHOOj . Java provides limited reflection function- 
ality, but no fine-grained MOP to manipulate the features covered by reflection. 
For example, a Java program may query what methods belong to a class but 
may not add or remove a method. Java does support coarse-grained redefinition 
via dynamic class loading. The partial evaluation rules presented in pjNflflj are 
specific to Java and would not work with a different language or runtime. 

Online partial evaluation (e.g. IP.iifASp residualizes with respect to actual 
values, as does dynamic partial evaluation. The machinery and challenges of dy- 
namic partial evaluation parallel those of online partial evaluation, using runtime 
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analogs of the structures maintained by online partial evaluation (e.g. generic 
functions for polyvariant specialization). 

Runtime partial evaluation, as in [( lINOtil f Vc X ;t)7j . defers some of the partial 
evaluation process until actual data is available at runtime. However, the scope 
and actions related to partial evaluation are largely decided at compile time. 
Dynamic partial evaluation goes further, deferring all partial evaluation activity 
to runtime. 

The technique of specializing a function on finer types than originally de- 
clared, in a language with dynamic dispatch, has been pursued by several re- 
search efforts - notably the Self KJha92l and Cecil IUCC95I projects. In IVCCh7l 
and ISLCM95I . Volanchi, Schultz, et al. use declarations and partial evaluation to 
achieve similar specialization for object-oriented programs. The implementation 
of dynamic partial evaluation presented in this paper also produces specialized 
versions of functions. 

The runtime partial evaluation in fv^ includes the notion of guards 
against future violation of invariants, and dynamic partial evaluation against 
“likely invariants” also requires such guards. The idea of optimistic optimization 
with respect to quasi-invariants has been pursued by Pu et al. in the Synthesis 
kernel [PM188| and then Synthetix |PAB~*~9!^ projects in the context of operating 
systems. 



3 Overview 

We present an overview of the main concepts used in this paper, including dy- 
namic partial evaluation, generic functions, and multiple dispatch. 



3.1 Dynamic Partial Evaluation 

Dynamic partial evaluation happens as a side-effect of evaluation. At runtime, 
an expression is evaluated with respect to an environment that contains both 
the usual dynamic bindings of identifiers to values, and also static bindings from 
identifiers to types. The static component of the environment corresponds to 
the symbolic environments of compile-time partial evaluation. In addition to 
producing a value for the expression, dynamic partial evaluation produces a 
residual version of the original expression based on the types in the environment. 
The folding that occurs to produce a residual expression is the same as in online 
partial evaluation - if the environment indicates that an identifier maps to a 
fully static value (i.e. has a singleton type), then an expression based on that 
identifier may be folded. Optimization may occur if a value is not fully static, 
but its type is known. For example, if accurate types are known for argument 
values before a call, dynamic type checks may be avoided. 

Note that dynamic partial evaluation does not suffer from the “infinite un- 
folding” issues of static partial evaluation. Because dynamic partial evaluation 
happens during evaluation, partial evaluation only loops if the application loops. 
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While dynamic partial evaluation is defined at the expression level, control 
and collection of the results of dynamic partial evaluation happen at the function 
level. For example, suppose a function f (int x, Object y) is called repeatedly 
with a value of 42 for x and with (different) instances of the Point class for y. For 
one such invocation of f , the decision is made to evaluate the body with dynamic 
partial evaluation enabled. For the duration of this call to f, every expression 
is evaluated in an environment that maps x and y to both their actual concrete 
values (42 for x, some Point instance for y), and also to the types eg(42i and 
Point. When the execution of f’s body is complete, we have both a concrete 
value for this call, and also a new version of f’s body expression, specialized to 
the signature (eq{4:2), Point). Within the body of the specialized version, we will 
have performed optimizations assuming that x is 42 and that y is an instance of 
the Point class. The new, specialized version of f is then added to f’s generic 
function and will be selected whenever f is called with its first argument 42 and 
its second argument an instance of Point. 

Dynamic partial evaluation is intended to be used in conjunction with more 
static techniques. However, the dynamic features of the languages we are target- 
ing often preclude optimizations based on static analysis, and dynamic partial 
evaluation gives us a valuable tool for optimizing in the face of extreme dy- 
namism. 

3.2 Generic Functions and Multiple Dispatch 

The virtual machine in which we have implemented dynamic partial evalua- 
tion provides generic functions and multiple dispatch. Generic functions and 
multiple dispatch are used not just to model corresponding features in source 
programming languages, but are also an integral part of our implementation 
of dynamic partial evaluation. The key insight is that adding specialized meth- 
ods to a generic function at runtime corresponds to polyvariant specialization 
in compile-time partial evaluation, and the compile-time notion of sharing is 
handled at runtime via multiple dispatch. 

A generic function is a set of “regular” functions and selection criteria for 
choosing one of those functions (or signalling an error) for any given tuple of 
arguments. In a single dispatch language, such as C-P-l- or Java, a generic func- 
tion corresponds to a virtual method, and the selection criteria is to find the 
method defined in the class nearest above the receiver’s concrete class in the 
class hierarchy. Multiple dispatch generalizes single dispatch in that more than 
one argument may be used in the method selection process. For example, we 
may define a generic function foo consisting of the following methods: 
void foo (A this, A that); // method 1 

void foo(B this, B that); // method 2 

void foo(C this, B that); // method 3 

Suppose that B is a subclass of A and C is also a subclass of A (and all classes 

are instantiable) . Then the following sequence of code: 

^ The notation eq{v) denotes the singleton type containing exactly one value, namely 
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A anA = new A() ; B aB = new B(); C aC = new C(); 
foo(aC, anA); foo(aB, aC) ; foo(aB, aB) ; foo(aC, aB) ; 
invokes methods 1, 1, 2, and 3 in order. 

Support for generic functions and multiple dispatch serves two distinct pur- 
poses in our system. First of all, we are interested in supporting languages with 
“interesting” dispatch mechanisms, including multiple dispatch such as in Dylan 
and CLOS. Secondly, it is via the generic function and multiple dispatch mech- 
anisms that we both cache and also invoke specialized versions of functions at 
runtime. Consider the first call to foo, above. If the first call, foo(aC, anA), is 
executed with dynamic partial evaluation enabled, we may get a new version of 
method 1, 

// (specialized version of method 1) 
void foo(C this, A that); // method 4 

Within the body of the new method, dynamic dispatch based on, or dynamic type 
checking of, the this argument may now be optimized under the assumption 
that this is an instance of class C. After adding the new method to the foo 
generic function, later calls to foo with arguments of class C and class A will 
resolve to this newly created method. 

4 21dvm? a Dynamically- Typed Lambda Calculus with 
Subtyping and Generic Functions 

To clarify the mechanism of dynamic partial evaluation, we introduce Advm, a 
dynamically-typed lambda calculus with subtypes and generic functions. Fig- 
ure Q gives an operational semantics for this simple functional language. A^vm 
is modeled after, but much simpler than, DVML, the “native” language of the 
Dynamic Virtual Machine. Among other things, DVML supports recursive func- 
tions, predicate types [EECnS!, and more complicated function signatures. The 
syntax of A^vm is given by the following grammar: 

Exp ::= X \ n \ (if Exp Exp Exp) 

I (call Exp Exp -I-) I (gf-call Exp Exp -k) 

I (lambda ([a: Exp]+) Exp . Exp) 

where x ranges over identifiers and n ranges over integers. Note the Exp phrases 
in lambda expressions for specifying the argument types and result type of a 
function. 

The evaluation relation =k takes an expression e, an environment p, and a 
type T to an extended value V. Extended values are triples of a tagged value 
V, an expression, and a type. Evaluation must satisfy the following constraints 
(read ~ as “destructures into”): 

[=k constraints ] If |e] p r =k V ~ (u, e', then 

1. V : t' , and 

2. t' < T. 
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=> C [Exp X Env X Type) x ExtValue, ExtValue= Tag Value x Exp xType 
e refers to the expression being evaluated in the current rule, ” means “don’t care” 
^ is a finish function defined elsewhere 

V = some- operation binds v to the result of some- operation. 

V ~ e' deconstructs value v into its component parts. 

P([®1) = ^ “I ; check[v,r) check[n,r) 

[*] p r => E[e, {V),p, t) [n] p t E[e, {{n, e, eq[n))„l),p, r) 



[eo] p Tbooi =^1^0=^ [true, -)™i [eo] p Vb ~ [false, -)™, 

|ei] p T ^ V [62] p T ^ V 

[(if eo ei 62)] p T => E[e, [V, Vb), p, r) [(if eo ei e2)] p r ^ E[e, [V, Vo),p, r) 

l^o] P r/un ^ Vf — (c/oS'Ure((|xi], . . . , |^n]) , [Xargi , ■ ■ ■ 5 Xarg.,f) , TVes, |C/J, P/) , , } val 

[ei] p => Vi ~ (ui, [e-], Ti)™,, i £ 1 . . . n 

[e/] P/[gj 1-^ (t>i, \e'i},glb[ arg^i ) } val ], glh[T,T„.) => V 4 , i£l...n 

[(calleoei ... €„)} p t ^ E[e, [V,Vf , [Vi, . . . ,Vrf}) , p,r) 

[eo] p Tg, ^ VV, ~ {generic[ms, (r^^, . . . ,TgJ,Tg„f), 

[ei] p r,, ^ 14 ~ (wi, [e'], Ti)„„,, i £ 1 . . . n 
find-mam[ms, (Vi, . . . , V„)) ~ (V/, static?) 

Vf ~ (c/osMre(([a;i], . . . , [a;„]), , . . . ,r„,„),r™, [e/], p/), -, 

[specialize?, [spec-type^^, ...)) = choose-specialization[Vg , Vf, (Vi, . . . , Vn)) 

arg-type^ = [specialize? => spec-typep, glb[ri,Targi)), i £ 1 . . . n 

[e/] pf[xj 1-^ [vj, |e'], arg-type?) gai\ glb[r, t„,) ^ V, i £ 1 . . . n 

[(gf-call eo ei ... Cn)^ p t ^ E[e, F-vals, p,t), where 

F-vals = (V, Vg, (Vi, . . . , Vn), Vf, static?, specialize?, [spec-type ^, . . . )) 



|er^] p Ttype V4^ — [Vri, , ) wa/, i £ 1 . . . n 

[^■^res] P 'Ttype V 4 j.es — ['^'^rest y ) val 

Vf = cl0SUre[[lxij, . . . , [Xn]), (Um, . . . , Urj-es, [eo], p) 

Vf = (vf, e, eq[vf))vai ; check[vf,r) 

[(lambda(a:i e^,... ,Xn epjer„y . eo)j p t ^ E[e, [Vf, [V^^, . . . ,Vr„),Vp„,),p,r) 

Fig. 1. vIdvm, a Dynamically typed A Calculus with Subtyping and Generic 
Functions 



That is, the type r is a type constraint that the returned value, v, must satisfy. 

A tagged value is a value for which the function type-of returns a type. In 
Figure D upper case V’s range over extended values, and lower case v’s over 
tagged values. There is syntax for creating integer, boolean, and closure tagged 
values, and there are predefined functions for creating other tagged values, in- 
cluding types, generic functions, mutable cells, and lists. A type may be one of 
the predefined types or built from a type constructor - see Figure EJ for some 
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predefined values. Creating an extended value with tagged value v, expression e 
and type r is denoted {v,e, 

Types are ordered as follows: all types are subtypes of T, _L is a subtype of all 
types, subtyping between types constructed using logical connectives is based on 
implication, function types have the usual contravariant subtyping, a singleton 
type eq{v) is a subtype of type-of{v), and there is a predefined function, subtype 
for creating subtypes of (multiple) other types. 



Predefined Type Values: T , J-,Ti„t,ri,„„i,Tfu„,rgf for top, bottom, integers, booleans, 
functions (closures), and generic functions, respectively. 

Predefined Boolean Values: true, false. 

Predefined Function Values: 

• and{conjunct-types) , or{disjunct-types), not{type) , fun{arg-types, result-type) - build 
types from other types. 

• type-of{v) returns the (concrete) type for a given tagged value v. 

• check(v, t) returns true if type-of{v) is equal to or a subtype of r; otherwise, halts 
with an error. 

• static? {ext-vat) For an extended value (u,e,r)„„i, returns true if r < eq{v) - that 
is, if the value is completely static. We use < rather than = for type comparison 
because our type system includes conjunctive types r&r' such that r < (r&r') 
and t' < (r&r'). 

• list{vi, . . . , Vn), closure{var, arg-type, result-type, body-exp, closure-env), 

generic(list- of -methods, arg-type, return-type), subtype{list-of-supertypes) , eq{val): 
constructors for lists, closures, generic functions, subtypes, and singleton types, 
respectively. 

• add-method{generic, fun) adds a function (method) to a generic function. 

• glb{list- of -types) constructs the greatest lower bound of its type arguments. 

• find-mam{list-of-funs, arg-vals) selects the most applicable method given a set of 
function (“methods”) and a vector of argument values. If there are no applica- 
ble methods, find-mam halts with a “no applicable methods” error. If there are 
multiple (non-comparable) most applicable methods, find-mam halts with an “am- 
biguous methods” error. Otherwise, it returns the most applicable method and also 
a flag indicating whether method selection was static (more on this in Section O) . 

Fig. 2. Predefined values for vIdvm 



Environments p map from identifiers to extended values. The dynamic context 
is the projection of the environment as a map from identifiers to tagged values 
(i.e. the tagged value component of the mapped-to extended value). The static 
context is the projection of the environment as a map from identifiers to types 
(i.e. the type component of the mapped-to extended value). 

When evaluation of an expression is complete, the finish function T is called 
with all the values relevant to the just-finished evaluation. T has type: 

{Exp, Vector{Value) , Env, Type)^ExtValue and must of course satisfy con- 
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straints] - that is, if J^{e, {V, . . .) , p,r) = F' ~ (t>, e', then v : r and 

r' < r. For simple interpretation, we instantiate as the finish function 

^ai„p(e, {V , . . .),p,r) = F 

simply returns the first value in its value vector. Later, we will define a 
finish function that implements dynamic partial evaluation. 



4.1 Discussion of Evaluation Rules 

In Figure ^ the symbol e always refers to the expression being evaluated. A 
rule subexpression of the form v = some-operation binds v to the result of 
some- operation for use elsewhere in the rule. An expression of the form u ~ e' 
deconstructs the value v, binding the variables mentioned in e' . We use the sym- 
bol ” to indicate that we will not make use of the corresponding component 
value. 

var-ref: Evaluation of a variable reference x looks up the identifier x in the 
environment, checks that it satisfies the current type context r, and sends the 
value to T (which, in the basic interpreter, simply returns the value), 
integer: Evaluation of a numeric literal n checks the value n against the current 
type context r, constructs an extended value with a fully-static type, and then 
sends the value to T . 

if-true, if-false: Evaluation of an if expression first evaluates the test expres- 
sion eo in a boolean type context, producing Vq. Either the true branch, ei, or 
the false branch, 62 , is evaluated, depending on the truth value of the tagged 
value component of Vq, and then the resulting value is sent to T, along with Vq. 
call: Evaluation of a function call first evaluates the function expression eo 
in a Tf^„ type context. We then destructure the function value (closure) into 
its bound variables (|a:i], . . . , |a:„]), argument types (r„,,g^, . . . ,r„g„), result type 
r„s, body e/, and the closure’s creation environment pf. Then the call’s argument 
expressions 6i,i G 1 . . .n are evaluated in type contexts of ■ For the values to 
which the function arguments will be bound, we create new extended values with 
types that are the greatest lower bound of the static type Ti of the argument 
value and the function’s argument type r^^g. for each argument position i . Next 
the body of the function, e/, is evaluated with the appropriately extended closure 
creation environment and with a type context that is the greatest lower bound 
of the function’s return type (TreJ and the current type context (r). Finally, all 
relevant values are sent to T . 

gf-call: Evaluation of a generic function call first evaluates and destructures 
its generic function argument. A generic function consists of a triple: a set ms 
of functions, aka “methods”, a vector of the argument types t^., and the result 
type Tg^eg- All functions closure{{lxij, , |a;„]), . . . ,r„,„),r„., e/, pf) in 

ms must satisfy the following constraints: 

1 - T^rgi < Tg. for each index i, and 
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The generic function call arguments are then evaluated with respect to the ar- 
gument types. Then the helper function find-mam is called to select the most 
applicable method from the set ms given the actual argument values Vi. The 
function choose- specialization decides whether or not to produce a new method 
for this generic function (using the results of dynamic partial evaluation) . If so, 
choose-specialization returns true and a vector of argument types. If method 
specialization is not chosen, the arguments are assigned types as in normal func- 
tion call - the greatest lower bounds of the declared argument types (of the most 
applicable method) and the static types of the argument values. For simple inter- 
pretation, choose-speeialization always returns false. We discuss other scenarios 
in Section Ol The body e/ of the most applicable method is then invoked as in 
a normal function call. 

abstraction: Evaluation of a lambda expression first evaluates the expressions 
for the argument types, 6^, and result type, all of which must satisfy the 
Ttype type. Then a closure is created, the value is checked against the current type 
context T, and the closure is sent to T . 

4.2 Generic Function Method Selection 

The helper function find-mam{ms, {V\, . . . , 14 ,)) first finds the subset of ms that 
are applicable given the argument values, 14 — ("Ci, e', 

fnsppp = {/ I / e ms & / ~ closure{{{xi\, . . . , |x„]), r„., e/, pf) 

& check{vi,Tpryfi,i e 1 . . . n} 

If mSapp is empty, a “no applicable method” error is flagged and execution halts. 
Next a set of candidates for the most applicable method is derived (ideally a 
singleton set): 

mams = {/ | f G ms„„p 



& / ~ closure{{\xi\, . . 


■ 1 |^n| ) ; {j~arg-Y i • ■ 




k G mSppp s.t. 






f ~ closurel{lx[j, . . 




■■ p'f) 



& < Tppy. for any i e 1 ... n)} 

For any two applicable methods in ms^pp, if the argument types of one of the 
methods are all < the corresponding argument types of the other, the second 
(less specific) method is removed from consideration. It is not allowed for two 
methods in a generic function to have identical argument type vectors. The set of 
most applicable method candidates, mams, consists of applicable methods each 
of which has an argument type vector that is incomparable to the argument 
type vector of any other method in mams. If the set of most applicable method 
candidates, mams, has exactly one element, that element is returned; otherwise, 
an “ambiguous methods” error is signalled and execution halts. 

find-mam must also keep track of whether the selection of the most applicable 
method can be accomplished using only the type information in the extended 
value tuples. For example, suppose the relevant type hierarchy is T2 < t\, the 
single argument value V has a concrete type of T 2 , and there are methods for 
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the generic specialized on T 2 and t\. If the static type of the argument value 
is Ti (that is, V ~ {v,—,Ti)„ai, and check{v , T 2 )) , the choice of method cannot 
be determined statically (based on the static type alone). However, if the static 
type is eq{v) or T 2 , then the method selection is static. 



5 Instrumenting for Dynamic Partial Evaluation 

Recall that extended values are triples {v : TaggedValue, e :Exp, r : Type)^ai- In the 
basic evaluator, only the tagged value component, v, is explicitly used. When 
dynamic partial evaluation is in effect, the expression and type components of an 
extended value become meaningful. In particular, the following holds (we write 
=l>pe to indicate [Tp^/T] and =l>ain,p to indicate [J^si„p/lF], we use a dash 
(— ) for values we do not care about, and we write v : r for check{v,r) = true): 
[dpe constraints ] If e p r =^pe (u, e', then 

1. V : t' < T (that is, constraints]). 

2. e p T (u, - , and 

3. For every environment p' that statically matches p, 

if e p' t" ^simp then 

(a) e' p' t" ^Mmp (w', and 

(b) v' : t' 

An environment p' statically matches an environment p if dom{p) C dom(p') 
and for all x G dom{p), if p(x) = (— , — ,r)„„,, then check{p' {x) , t) =true. That is, 
the types of the bound variables match, though the values may be different. 

The statement [dpe constraints] above specifies that if e p r=^pe (w, e', r')„p„ 
then the value v is of type t' and evaluating the expression with the simple evalu- 
ator =^ai„p will return the same value v. Furthermore, evaluating the residualized 
expression e! in a statically matching environment p' will produce the same value 
as evaluating the original expression e in p'. In other words, any optimizations 
that were done to produce e! from e depended only on the types of the values in 
the environment p - that is, the static context. Finally, all values produced by 
evaluating e! in any statically matching environment will be of type t' . 

5.1 Tpp — A Finish Function That Implements Dynamic Partial 
Evaluation 

We present a finish function iFp^ that implements dynamic partial evaluation - 
that is, iFpp satisfies [dpe constraints]. Ep^, is defined by structural induction 
on its expression argument. In the following, we present each case for iFp^ along 
with some discussion. 

Variable Reference: iFp^dx], (V),p, r), where V ~ (u,e',r')„„, 

= if static?{V) & expressible-as-literal?{v) - if static and a literal 
{v, |u], - we can fold to a constant 
V 
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Both the value and the static type for a variable reference come directly from 
the environment. If the value is completely static and expressible as a literal 
(that is, an integer or a boolean), the variable reference may be replaced by the 
corresponding literal expression. 

Literals: (V),p,t) = V 

A literal is always completely static - that is, a given literal expression will 
always return the same value, no matter in what static context it is evaluated. 
The reduction relation ensures that for literals, IF will be called with a fully- 
static value. 

Conditionals: J^pe(|(ifeo ei 62)], (C, Vb), p, r), where V ~ (u,e',r')™, and Vq ~ 

(fo, e'g, To)„„i 



= if static?{Vo) 
V 

if Vo 

(v, I(if e'o 



- if test val is a constant, 
- we can eliminate the conditional 
- otherwise, rebuild the conditional 
e' e2)],T)„„, 



(v, |(if e'o ei e') 



The decision whether or not to fold a conditional expression depends on whether 
or not the test expression is completely static. If the test expression is completely 
static, the if expression folds away. Otherwise, we rebuild the conditional with 
the residuals of the test expression and the chosen branch. Note that the static 
type returned for the value is only T. 

An important optimization for conditionals is the case when eg is a variable 
reference. In that case, the chosen branch may be evaluated with the environment 
mapping the test variable to the singleton type of either eq{true) or eq{false). This 
is in fact always the case for the Dynamic Virtual Machine, where expressions 
are all in essentially static single assignment form, but we do not present that 
optimization here. 

Function call: J^p,( |(ca II eo ei . . . e„)], (V, V/, (Vi, . . . , V„)), p, r), where 
V ~ (v, e', r')™„ Vf ~ (vf, V ~ (wi, e', r*)™,, i € 1 . . . n, and 

Vf ~ closure{{lxil, , |x„]), p/) 

= if static? {Vf) - if fun is a constant, 

. . . finish-known- function . . . 

- here if fun value is not constant 
{v, {{caWe'f e'i...e'„)],T).„, 

where the code fragment finish-known-function is: 

{if static?{V) - if return val is a constant, 

{v, val 2 exp{v) , t') „ai ~ fold call 

if inline?{v f) - if fun is a constant, may choose to inline 
inline- call{V, Vf, (Vi, . . . , Vn),p, t) 

— here if fun is constant, but not inlining 
(u,|(call e'f e'i...e;)],r„.)„„,) 
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Finishing a function call involves choosing one of several options: 

1. Fold the call to either a literal or a variable reference expression. 
This can only be done if the value of the call is completely static. Folding 
to a variable reference may involve extending the current environment with 
a new binding. Residualizing a constant is handled by the function val2exp, 
whose logic is outside the scope of this paper. 

2. Inline the residualized body of the closure at the call site; pre- 
ceded by any argument type checks that did not statically succeed. Inlining 
may also involve extending the current environment, to handle references 
to variables closed over by the inlined function. Inlining is handled by the 
function inline-call, again outside the scope of this paper. 

3. Replace the call by an unchecked call; preceded by any argument 
type checks that did not statically succeed. 

4. Leave the call as is. 

For optional the language needs to be extended with operations that do less 
type checking than base Rdvm, but we do not discuss those in this paper. Note 
that if the function value is static, but we choose not to inline, we may still use 
the declared return type of the function for the static type of the returned value. 
Generic Function Call: lFp,,(|(gf-call eg ei ... Cn)}, F-vals, p,r), where 
F-vals = (y, Vg, (Vi, . . . , Vn), Vf, static?, specialize?, {spec-typci , ...)), 

V ~ (u,e',r')„„„ Vg ~ {vg, Vg ~ generic{ms, (t,^, . . . ,r,J,Tg,.J, 

Vf ~ {closureiilxij, . . . , |x„]), . . . , r„„ e/, p/), 

and Vi ~ (ui, e', i € 1. .. n 

= if specialize ? 

add-method{v g , closure{{\x\\, . . . , |x„]), {spec-typci, ■ ■ ■), e' , pf)) 
if static? {Vg) - if generic is a constant, 

if (static?) - if can statically dispatch 

. . . finish-known- function . . . 

{v, |(gf-call e'g e\... e'„)l, - can’t statically dispatch 

- here if generic is not a constant 
(v, e, T)„„, 

If method specialization was chosen, finishing a generic function call adds a 
new method to the generic functioij^ using the residualized body of the applied 
method for the closure body, the argument types determined by choose-special- 
ization, and other attributes from the original closure. 

Technical note. Actually, the closure environment is extended with bindings for 
any static values exposed during inlining that are not expressible as literals and 
instead bound to fresh variables. 

In the Dynamic Virtual Machine, the logic abstracted by choose-specialization 
simply uses programmer-defined rules, in the spirit of to decide when 

to create specialized versions of generic function methods. 



^ In the DVM, method addition is contingent on there being some useful optimization 
during specialization. 
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If method specialization was not chosen, then we test whether the generic 
function value is constant and the most applicable method can be chosen based 
strictly on the static types of its arguments. In that case, finishing proceeds as in 
the case for a regular function call when the function argument is static. Recall 
from Section lO that method selection is considered not static if at least one of 
the methods of the generic function is not applicable according to the concrete 
argument types, but is potentially applicable according to the static types of the 
arguments. 

Abstraction: J^pe(|(lambda(a:i en,-.. ei-„)eT-„, ■ Cq)], 

(^/ ) (^Ti , ■ • ■ , ) , K™) , P, t) , where Vf ~{vf,e, 

Ki ^ e;., for i e 1 . . . n, ^ -)™i 

= (u/, |(lambda(a:i . eo)l,r/)„„, 

Lambda abstraction produces a closure value. To finish an abstraction, we return 
the closure, a rebuilt abstraction expression using the residual expressions from 
the type expressions, and the singleton type constructed by =>. 

6 Examples of Dynamic Partial Evaluation 

We give a few examples of how dynamic partial evaluation works in practice. 



6.1 A Contrived Example 

Suppose we are dynamically partially evaluating the following expression (where 
(let(cc r) = Co in ei) is a macro for ((lambda(a: r)T . ei) eo)): 

(let (a T) = (if (> 6 0) 3 4) in 
(let (c TiJ = (if (> d 0) 5 6) in 
(gf-call g a b))) 
in the following environment: 

(l,l,eg(l))„„i 
d 1-^ (1, 

g ^ {generic{{gi,g 2 ), (T), T), -, eq{g))„i, 

where gi = closure{{x,y),{T,T),T,l{+ x j/)],PgJ 
52 = closure{{x,y),{T,Ti„t),T, lxj,pg^) 

The identifier b is statically bound to the integer 1. d is also bound to 1, but 
has static type Ti„j. The identifier g is statically bounc0 to a generic function of 
two methods - one with the most general specializers, and one specialized on 
integer values for its second argument. The first expression to evaluate is (> 6 0). 
Because b is completely static, the result value, true is completely static and the if 
expression can be folded. The result of the if expression is the fully static value 3. 
The identifier a is bound to the value 3, with residual expression 3, and gets static 
type 6^(3), which is the greatest lower bound (gib) of the declared type T and 
the result type 6^(3) of the expression. The comparison (> d 0) also evaluates to 



^ g I— > — , eq{g))^ai means that g is completely static. 
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true, but the value is not completely static because the static type of d is Ti„f The 
identifier c is bound to the value 5, residual expression (if(> d0)56), and gets 
static type which is the gib of the declared type Ti„t and the type of the result 
of the expression, namely T. The most applicable method for the call to g is g 2 - 
Furthermore, static type r,„t of variable c is sufficient to statically select the most 
applicable method at the call, so the gf-call can be replaced by a simple call to 
g 2 - When the body of g 2 is executed, it returns the value of its first argument, a, 
which is fully static. Because the method can be statically selected, and because 
the result of the call is a fully static value, the whole gf-call expression can be 
folded to the literal expression 3. Thus the expression residualizes to: 

let (c r,„j) = (if (> dO) 5 6) in 3 

Dead variable elimination may eliminate the now useless let construct. 

6.2 Example: Dynamic Partial Evaluation of Reflection in Java 

In pjNflflj . Braux and Noye use partial evaluation techniques to eliminate re- 
flection overhead in Java. The rules they introduce are specific to the reflection 
API of Java. Dynamic partial evaluation provides a general mechanism that 
automatically eliminates the reflection overhead addressed by Braux and Noye. 
Following is the main example from [BNOOj : 

public static void dumpFields (Object anOb j ) 

throws java. lang. IllegalAccessException { 

FieldG fields = anObj . getClass () . getFields () ; 
for (int i = 0; i < fields . length; i++) 

System. out . println (fields [i] .getNameO + 

" + f ields [i] . get (anObj ) ) ; 

} 

If dumpFields is called often on a specific class, say Point, it is worthwhile 
to create a specialized version of dumpFields specific to Point. Assume for 
now that the Point class has no subclasses. Within the specialized version of 
dumpFields, most of the reflection overhead can be folded away - the call to 
getClass will always return the Point class, and getFields will always return 
an array containing the x and y Fields. Of course, the actual values of x and y 
are dynamic - that is, they will vary between invocations of the method. After 
partial evaluation, the specialized method should be something like: 

public static void dumpFieldsPoint (Point anObj) { 

System. out . println ("x : "tanObj . x) ; 

System. out .printlnC'y : "tanObj.y); } 

We have an implementation of the relevant parts of the Java runtime, and a 
translation from Java into the Dynamic Virtual Machine (DVM). The above 
example, in the case that Point has no subclasses, folds to DVM code analo- 
gous to that given above, and a method specialized on Point is added to the 
dumpFields generic function. Thereafter, calls to dumpFields with Point argu- 
ments automatically select the optimized version. 
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Furthermore, if later calls to dumpFields take place within the context of 
specializing some other generic function, and the argument is statically bound 
to Point, the optimized code may be inlined into the calling method, and so on. 

7 Optimistic Dynamic Partial Evaluation 

As was mentioned in the introduction, we want to optimize with respect to 
“quasi-invariants” - in particular, elements of the meta-object protocol (MOP) 
that are technically mutable but rarely modified in practice. 

In the Dynamic Virtual Machine, there are two mutable datatypes: cells 
and generic functions. A cell contains a single value that may be changed, and 
a generic function may be modified by updating its method list. As dynamic 
partial evaluation proceeds, each optimization (folding, inlining) notes any cells 
or generic functions that have been referenced. When a new method is added to 
a generic function as a result of dynamic partial evaluation, all referenced cells 
and referenced generic functions are instrumented to undo the optimization if 
mutated. 

In the example from Section l?m suppose the expression exists in a method of 
the generic function named h. After the body of the method finishes, a new ver- 
sion of the method, including the residual expression from the example, is added 
to h’s method list. In this case, a dependency is recorded between the generic 
function g and the newly-added method. If at some later point add-method is 
called on the g generic function, the newly added method is removed from h's 
method list. 

In the example from Section l?r^ dependencies are created between the generic 
functions getClass, getFields, lengthy getName, and get and the specialized ver- 
sion of dumpFields. Adding a new subclass of Point will add new methods to 
the generic functions getFields^ getName, and get, thus removing the specialized 
method. 

In fact, with our current dependency tracking, adding any new class to the 
system will cause the specialized method of dumpFields to be removed, because 
our level of granularity is only at the generic function level, as opposed to specific 
tuples of type hierarchies. 

In the Dynamic Virtual Machine, cells and generic functions exposed by the 
MOP are considered “quasi-invariant” and dynamic partial evaluation tracks 
references to them. 

8 Controlling Dynamic Partial Evalnation 

Dynamic partial evaluation is subject to the “infinite specialization” problem 
of polyvariant specialization systems. The Dynamic Virtual Machine associates 
a set of “specialization rules” with each generic function; each rule includes a 
predicate and a specialization signature. When a generic function is invoked, 
and after the most applicable method has been chosen, the specialization rules 
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of the generic are matched against the values of the method, the argument 
values, and the argument values’ static types. If a rule matches, it specifies the 
signature against which specialization should occur. This is similar in spirit to 
the specialization classes of Ema. 

Currently, the specialization rules are given by the programmer. To do the ex- 
ample from Section It). 21 we added a rule to the dumpFields generic that matches 
methods specialized to Object and produces specialization against the concrete 
type of the argument. 

Our goal is to dynamically generate specialization rules based on profile in- 
formation, as done in mm- 

9 Tracking Side Effects 

As for any partial evaluator for an imperative language, dynamic partial evalua- 
tion must avoid folding function calls to values when there may be side effects in- 
volved. The Dynamic Virtual Machine handles this by threading a side- effecting? 
flag through evaluation of an expression. Thus, a function call may return a fully 
static value, but the call expression cannot be residualized to a constant if there 
were side-effecting operations involved. For example, a fully static call on fully 
static argument values that returns a newly-allocated list of those argument val- 
ues cannot be folded to the list itself, because next time that call should return 
another newly allocated list (containing the same values). For a function call 
with a non-static function, and for a conditional with a dynamic test, dynamic 
partial evaluation must be pessimistic about whether side effects occur. In the 
Dynamic Virtual Machine, the primitive operations define whether or not they 
are side-effecting. 

10 Conclusions and Future Work 

Dynamic partial evaluation is a technique for instrumenting interpretation in 
order to perform partial evaluation actions as a side effect of evaluation. This is 
accomplished by interpreting expressions in an environment that maps identifiers 
not only to values but also to types. The type of a variable can be understood as 
“how much information dynamic partial evaluation is allowed to assume about 
this binding.” 

Dynamic partial evaluation has been implemented as part of a Dynamic 
Virtual Machine designed to host dynamic, reflective, higher-order languages 
with subtyping. In the current implementation, dynamic partial evaluation is 
always “on” - that is, evaluation always creates residual expressions and tracks 
static types. We would like to be able to dynamically switch between dynamic 
partial evaluation and simple interpretation - suffering the overhead of dynamic 
partial evaluation only when we know we will use the results. 

As far as when to enable dynamic partial evaluation of a method, we cur- 
rently specify rules by hand, in the spirit of mwi\ . We plan to use dynamically 
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generated profile data to decide when and where to do dynamic partial evalu- 
ation. Note that we are focusing on highly reflective runtime environments, so 
profile data should be readily available. We also plan on using more sophisticated 
techniques for deciding when to inline function bodies. 

We have not yet addressed the efficiency of multiple dispatch itself, but we 
intend to follow the lead of Chambers and Chen, Kjm . The idea is that a 
generic function call expression is replaced by an inline binary decision tree, 
with leaves being direct calls to methods. Again, a method with generic calls 
replaced by decision trees is guardedly added to the generic function. When the 
generic functions involved are modified at runtime, the specialized version is 
removed and the original method, with generic function calls, is restored. 

At the time of this writing, the two most vexing issues are: 

• Deoptimizing methods that have an active call (related to on-stack replace- 
ment in Self), and 

• Avoiding a bad interaction between newly specialized methods and previ- 
ously specialized methods. When adding a newly specialized method causes 
previously specialized methods to be invalidated, we lose. 

We are exploring several approaches to both problems. 

The ideas of dynamic partial evaluation apply to any level of interpreta- 
tion, and the Advm language and the Dynamic Virtual Machine are fairly high 
level. The intent is that interpretation, including dynamic partial evaluation, 
will spend only enough time at this very high level to do optimizations specific 
to that level ~ in particular, optimizations with respect to user-defined types. 
After a flurry of specialization, the goal is to translate to either a low-level 
virtual machine or to native instructions where further dynamic optimization 
may take place. The lower level representations of methods (that is, in terms 
of native instructions rather than DVM instructions) are cached using the same 
generic function mechanism. When assumptions made during code generation 
are violated, the native version is replaced by its original high level version. 
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Abstract. Tag elimination is a program transformation for removing 
unnecessary tagging and untagging operations from automatically gener- 
ated programs. Tag elimination was recently proposed as having immedi- 
ate applications in implementations of domain specific languages (where 
it can give a two- fold speedup), and may provide a solution to the long 
standing problem of Jones-optimal specialization in the typed setting. 
This paper explains in more detail the role of tag elimination in the im- 
plementation of domain-specific languages, presents a number of signifi- 
cant simplifications and a high-level, higher-order, typed self-applicable 
interpreter. We show how tag elimination achieves Jones-optimality. 



1 Introduction 

In recent years, substantial effort has been invested in the development of both 
theory and tools for the rapid implementation of domain specific languages 
(DSLs). DSLs are formalisms that provide their users with notation appropriate 
for a specific family of tasks at hand. A popular and viable strategy for imple- 
menting domain specific languages is to simply write an interpreter for the DSL 
in some meta-language, and then to stage this interpreter either manually by 
adding explicit staging annotations (multi-stage programming or by 

applying an automatic binding-time analysis (off-line partial evaluation [Bj ) . The 
result of either of these steps is a staged interpreter. A staged interpreter is es- 
sentially a translation from a subject-language (the DSL) to a target-languag^. 
If there is already a (native code) compiler for the target-language, the approach 
yields a simple (native code) compiler for the DSL at hand. 

This paper is concerned with a costly problem which can arise when both 
the subject- and the meta-language are statically typed. In particular, when 

* Funded by a Postdoctoral Fellowship from the Swedish Research Council for Engi- 
neering Sciences (TFR), grant number 221-96-403, and by subcontract #8911-48186 
from Johns Hopkins University under NSF agreement Grant # EIA-9996430. 

** Funded by the University of Copenhagen. 
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the meta-language is typed, there is generally a need to introduce a “universal 
datatype” to represent values. At runtime, having such a universal datatype 
means that we have to perform tagging and untagging operations. When the 
subject-language is untyped, we really do need these checks (e.g. in an ML 
interpreter for Scheme). But when the subject-language is also statically typed 
(e.g. an ML interpreter for ML), we do not really need the extra tags: they are 
just there because we need them to statically type check the interpreter. When 
such an interpreter is staged, it inherits this weakness and generates programs 
that contain superfluous tagging and untagging operations. To give an idea of the 
cost of these extra tags, here is the cost of running two sample programs (the 
factorial function applied to 12 and the Fibonacci function applied to 10) with 
and without the tags in theirH: 



Term (fully inlined) 


fact 12 


fib 10 


Speedup (after tag elimination) 


2.6x 


1.9x 



The table shows that removing the superfluous tags from these two programs 
speeds up their execution by a factor of 2.6 and 1.9 times, respectively. 

How, then, can we ensure that programs produced by the staged interpreter 
do not contain superfluous uses of the universal datatype? 

One possibility is to look for more expressive type systems that alleviate the 
need for a universal datatype (such as dependent type systems). But it is not 
clear that self-interpretation can be achieved in such languages m . A more 
pressing practical concern is that such systems lose decidable type inference, 
which is a highly-valued feature of many typed functional programming lan- 
guages. 

Tag elimination uniQ is a recently proposed transformation that was de- 
signed to remove the superfluous tags in a post-processing phase. Thus our ap- 
proach is to stage the interpreter into three distinct stages (rather than the 
traditional two). The new extra stage, called tag elimination, is distinctly differ- 
ent from the traditional partial evaluation (or specialization) stage. In essence, 
tag elimination allows us to type check the subject program after it has been in- 
terpreted. If it checks, superfluous tags are simply erased from the interpretation. 
If not, a “semantically equivalent” interface is added around the interpretation. 



1.1 Jones-Optimality 

The problem of the superfluous tags is tightly coupled with the problem of 
J ones- optimal self-interpretation in a statieally-typed language. The significance 
of Jones-optimality lies both in its relevance to the effective application of the 
above strategy when a statically-typed meta-language is used, and the fact that 
the problem has remained open for over thirteen years, eluding numerous signif- 
icant efforts HiQ 

Intuitively, Jones-optimality tries to address the problem of whether for a 
given meta-language there exists a partial evaluator strong enough to remove 



^ Data based on 100,000 runs of each in SML/NJ. 
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an entire level of “interpretive overhead” 0, Section 6.4]. A key difficulty is 
in formalizing the notion of interpretive overhead. To this end, Jones chose to 
formulate this in the special case where the program being specialized is an 
interpreter. This restriction makes the question more specific, but there is still 
the question of what removing a layer of interpretive overhead means, even 
when we are specializing an interpreter. One choice is to say that the cost of 
specializing the interpreter to a particular program produces a term that is no 
more expensive than the original program. This however, introduces the need for 
a notion of cost, which is non-trivial to formalize. Another approach which we 
take here is to say that the generated program must be syntactically the same as 
the original one. While this requires prohibiting additional reductions, we accept 
that, as it still captures the essence of what we are trying to formalize. 

The relevance of Jones-optimality lies in that, if we cannot achieve it, then 
staging/partial evaluation will no-doubt produce sub-optimal programs for a 
large variety of languages. Thus, resolving this problem for statically-typed pro- 
gramming languages means that we have established that statically-typed lan- 
guages can be used to efficiently implement compilers for a large class of do- 
main specific languages (including, for example, all languages that can be easily 
mapped into any subset of the language that we consider) . 



1.2 Contribution and Summary of the Rest of the Paper 

This paper shows how tag elimination achieves Jones-optimality, and reports 
(very briefly) on an implementation that supports our theoretical results. In 
doing so, this paper extends previous theoretical work [El by presenting 1) 
a typed, high-level language together with a self-interpreter for it (needed for 
Jones-optimality), and 2) a substantially simplified version of tag-elimination. 
Previous implementation work |J| was in a first-order language. 

Section El presents a simply-typed programming language that will be used 
as the main vehicle for presenting the self-interpreter and the proposed trans- 
formation. The language has first-order data and higher-order values. We define 
the new annotations and their interpretations. A specification of a tag elimi- 
nation analysi^ is presented in Section El as a set of inference rules defined by 
induction on the structure of raw terms. In Section E we summarize the basic 
semantic properties of tag elimination. In this section, we define the wrapper 
and unwrapper functions that are needed to define a “fail-back” path for the 
tag-elimination transformation. 

Section 0 reviews interpreters. Section El addresses the relation between an 
interpreter and a staged interpreter, emphasizing the utility of the notion of a 
translation in this setting. In this section, we show how tag elimination analysis 

® In this paper, we present and focus on a specification of the analysis, not an algorithm 
for carrying it out. We expect that the analysis can be implemented by a total 
function that yields a result that can be validated by our specification, but at this 
time, we have not yet established this formally. In this paper, we will often refer to 
this specification as “the analysis”. 
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is sufficient to allow us to eliminate superfluous tags from typed staged self- 
interpreters. In Section Q we demonstrate the relevance of this result to the 
problem of Jones-optimal specialization. 

2 A Typed Language for Self-Interpretation 

First we present a programming language with first-order datatypes and with 
higher-order values. The types T in this language are simply: 

t::=D\y \ t^t 

The type D is for first-order data (like LISP S-expressions), the type V is for 
higher-order values (a universal datatype), and the last production is for function 
types. We can think of V as being generated by the following ML declaration: 

datatype V=FofV— >V| EofD 

But we do not need case analysis on V: We will only need value constructors 
(tagging) and simple destructors (untagging) for our purposes here (writing in- 
terpreters) . We assume an infinite set of names X ranged over by x, and that 
this set includes the special variables “nil, true, false” . The set of expressions E 
is defined as follow^: 

s ::= X I (s.s) u ::= car | cdr | atom? o ::= cons | equal? 

e ::= X I e e I Ax.e | fix x.e |‘s|ue|oee|ifeee|Ee| E~^ e | F e | F“^ e 

The type D will be inhabited by S-expressions represented by dotted-pairs s. One 
can use a distinct set for names of atoms, but it causes no confusion to simply 
use variables here. Note that substitution “does not do anything” with ‘s, or 
rather, it is the identity. We use a standard type system for this language. 

r \~ei : ti ^ t2 

r{x)=t r\~e2-ti r,x ■. ti he : t2 r,x:t\~e:t 

r\~x:t r\~eie2-t2 T h Ax.e : <i — > t2 TFfixx.e:t 

Thei : D 
i” h Cl : D r \~ 32 '■ t 

r \- e : D r \- 62 '■ h) r \- 62 ■ t 

T h‘s : D r \~u e : D T ho ei 62 : D T h if ei 62 63 : t 

The:D The:V The:V^V The:V 

T hE e : V T hE-i e : D OFF e : V ThF-i e : V 

The fixed-point construct used here seems to worry some readers. In particular, 
they expect such a construct to either take an additional parameter, or have a type 
restricted to function types, or both. We choose this form primarily because it keeps 
the various type-preserving translations simple. It may help the reader to note that 
any term written in such a language as ours can be easily translated into a language 
that uses a fixed-point operator restricted to function types. Note, however, that in 
a language with a fixed-point operator such as ours, variables are not values. 



Tag Elimination and Jones-Optimality 261 



The type system enjoys weakening and substitution properties. 

Lemma 1 (Weakening, substitution). 

1 . r \- e : ti A X ^ FV{e) U dom{r) => x \ t2\ F \- e : t\ 

2 . T hei : A a; : ti; T he2 : ^2 F he2[a: := ei] : 

Proof. All by easy inductions. □ 

A standard big-step operational semantics E ^ E for this language is 
used: 

ei Ax. 63 

62 Vi 



63[x := vi] ^ V2 e[x 


: := fix X.6] X 


61 ‘(S1.S2) 


61 62 ^ V2 


Ax. 6 Ax. 6 


fix x.6 X ‘s 


‘s car 61 ^ ‘si 








61 ‘si 


61 ‘(S1.S2) 


6 ‘x 


6 ‘(S1.S2) 


62 ‘S2 


cdr 61 ‘s2 


atom? 6 ‘true atom? e ^ ‘false cons 61 62 ‘(si-S2) 




61 ‘Si 


ei v\ 




61 ‘s 


62 ‘S2 


62 ^ V 2 


61 ‘false 


62 ‘s 


Si ^ S2 


vi ^ ‘false 


63 X3 


equal? 61 62 ^ ‘ 


true equal? 61 62 ^ 


‘false if 61 62 63 ^ X2 


if 61 62 63 X3 




e ^ V 6 E X 


e V 6 ^ F 


V 



E6^Eu E ^ e V F6^Fw F ^ e ^ v 



This semantics induces a set of values, namely, the largest set of terms on which 
the semantics is idempotent.The set of values V is defined as follow^ 

V ::= Ax. 6 | ‘s | E u | F x 

Note that V C E. This containment is one of the reasons why our treatment is 
often considered “syntactic” (as opposed to “denotational” ) . We find the con- 
tainment useful because it allows us to avoid having two similar but still slightly 
different notions of various concepts, such as typing. 

We can refine the type of evaluation to E ^ V. The three basic properties 
of values are the following: 

Lemma 2 (Values). 

1. 6i ^ 62 A 61 63 62 = 63, and 

2. 61 62 => 62 G V, and 

3. V ^ V. 



® We could have written E ‘s and F Xx.e for the last two cases, but that puts unneces- 
sary restrictions on the untyped language. It also fails to give us “the largest” set on 
which the semantics is idempotent. In the typed setting, the type system will ensure 
that typed values will necessarily have the more restricted form. 
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Proof. All proofs are by simple inductions over the height of the derivation. 

The semantics also enjoys type-preservation. 

Lemma 3 (Type Preservation) . P\-e:tAe^v => P \~v : t 

Note, however, that it is still possible for some terms to “get stuck” m in our 
language, such as in trying to take the tail (car) of an empty list (nil). We will 
write = for syntactic equality of terms, up to a conversion. For semantic 
equality we will use the largest congruence when termination is observec 0 . A 
context C is a term with exactly one hole []. We will write C[e] for the variable 
capture filling of the hole in C with the term e. Two terms ei,e2 G E are 
observationally equivalent, written ei « 62, when for every context C it is 
the case that: 



{ 3 v.C[ei\ ^ v) { 3 v.C[e 2 \ ^ v) 



2.1 Semantics-Preserving Annotations 

The key idea behind the proposed approach to dealing with the interpretive 
overhead is that the user writes an interpreter which includes some additional 
annotations that have no effect on the semantics of the program but that do 
have an effect on what happens to the superfluous tags. Thus, the programmer 
writes the interpreter in a language of annotated terms. An annotated term 
e S E is a term where each occurrence of E _,E~^ _,F and _ is annotated 
with one of two annotations B: 



b ::= k | e 

Where k stands for “keep” and e stands for “eliminate” . Substitution is defined 
on annotated terms in the standard manner. Any term can be lifted into an 
annotated term. Lifting, written, [_] simply annotates every constructor and 
destructor with the tag k. Lifting is substitutive: [ei] [a: := [62]] = |"ei[a: := 62]]. 
Lifting types and environments is simply the identity embedding. 

Annotated contexts are defined similarly to terms. Lifting on contexts \C] 
is defined similarly to terms. The evaluation function on terms can be lifted to 
annotated terms where all the constructs are propagated during a computation 
without inspecting or making any changes to the annotations (Thus, it’s OK to 
use a k-untag operation to remove an e-tag in this semantics.) 

The subject interpretation on annotated terms |_| : E ^ E simply for- 
gets the annotations, and the target interpretation ||_|| : E ^ E eliminates 

® Note that, because we have datatypes and some datatype operations can get stuck, 
one should ideally use a notion of equivalence which distinguishes between getting 
stuck and diverging. In this paper, we avoid this distinction for the sake of simplicity. 
Because nowhere in our treatment do we exchange a possibly-stuck term with a 
possibly- non-terminating term (or the other way around), we expect our results to 
generalize. 
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constructs annotated with e and just drops the k annotation from the others. 
For example, ||Fe (carx)|| = car x. Note that ||"e]| = |||"e]|| = e, and that both 
notions of erasure are substitutive. Both notions of erasure are also onto. These 
facts will allow us to keep reasoning with observational equivalence simple. 

The subject interpretation allows us to lift observational equivalence to 
annotated terms, that is, we will define equivalence on annotated terms as fol- 
lows: 



ei 




|e2| 



This means that, from the user’s point of view, tagging and untagging operations 
annotated with e or k are semantically the same. Thus, we can really think of 
these annotations as being purely hints to the tag-elimination analysis that can 
affect only performance. In this paper, if one “hint” is wrong, no tag elimination 
will be performed at all. The goal of this paper is not, however, to demonstrate 
that there is a robust analysis that solves optimality, but rather that there is 
an analysis at all. (See Makholm 0 for some ideas to alleviate this practical 
problem.) 



3 A Specification of a Tag Elimination Analysis 

In this section, we present a specification of a new tag-elimination analysis. 
The analysis will be presented as a type system defining a judgment T h e/a. 
Intuitively, the judgment says “a describes the type of e before and after the extra 
tags are eliminated” . Annotated types T are basically types carrying names 
of tags (either E or F) in certain positions, and are defined as follows: 



t ::= D I V I t ^ t I E t I F t 

We will use two strict subsets (C C T and A C T) that can identify two special 
families of refinements: 



c::=V| ED| F(c^c) 
a ::= D | c | a — > a 

An annotated type c identifies a subset of values (and terms) of type V, and an 
annotated type a corresponds to legitimate type-specializations of a (subject) 
value of any type. In the case of the first production for c, the subset is the 
whole set. In the second case, the subset values that have E tags as the outer- 
most (or top most) construct. In the next case, values that have a F tag. For 
terms, the annotated type identifies certain terms that can either diverge or 
evaluate to values identified by the annotated type. 

In the context of the work of Hughes and Danvy, an annotated type E D 
can be seen as describing a “type-specialization path” from a value of type V 
to a value of type D. The additional tag information in the annotated type tells 
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us that we achieve this “type specialization” (semantically) by eliminating an E 
tag. The subject |a| and target ||a|| interpretations of annotated types are: 

|D| = D, |c|=V, |ai ^ 02! = |ai| ^ |a2|, 

||D|| = ||E D|| = D, ||V||=V, ||Fa|| = ||a||, ||ai ^ 02!! = ||ai|| ^ ||a 2 ||. 

The tag elimination analysis is defined as follows: 

T hei/oi ^ 02 

r{x) = a T 1-62/01 r^xj a\Vej a<2. r^xja^eja 

r\~x/a T 1-6162/02 T h Ax.e/oi ^ 02 Thfixcc.e/o 

Thei/D 

r hei/D r 1-62/0 

The/D Thea/D FVe^la 

Th‘s/D Thwe/D T ho 6162/0 Thif 6162 62/0 

The/D The/V The/V->V The/V 

ThEk e/V ThE~i e/D T h Fk e/V ThF^^ e/V -> V 

The/D The/E D The/ci^C2 The/F (61^62) 
ThEe e/E D ThE^i e/D ThFe e/F (ci ^ 62) ThFe"^ e/ci ^ 62 

The first two lines of the type system are completely standard. The last line in- 
troduces new rules that assign special annotated types for tagging and untagging 
operations annotated with “eliminate” annotations. 

This type system also enjoys weakening and substitution properties, and 
the semantics also enjoys an analog of type-preservation on closed terms. 
The analysis includes the type system, in that any type judgment h e : t has 
a canonical corresponding analysis judgment h \e'\/t. We can also establish 
stronger properties of the analysis: 

Lemma 4 (Double Typing). 

h|e| : |a| 4 = he/a l“||e|| : ||a|| 

This lemma captures the fact the analysis performs (at least) two things implic- 
itly: First, typing the term without the annotations, and second, typing the term 
after the e-marked tagging and untagging operations have been eliminated. Note 
that we would not be able to prove this property if we used a instead of c in the 
last rules. Next we prove stronger, semantic properties about the analysis. 

Proof. Both directions by simple inductions. □ 

4 Semantic Properties of Tag Elimination 

Tag elimination changes the type of a term, and so, necessarily, changes the 
semantics of the term. Fortunately, it is possible to give a simple and accu- 
rate account of this change in semantics using so called wrapper/unwrapper 
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functions W^U : B x A — > E (which are a bit more general than the classic 
embedding/projection pair discussed in the next section): 



Wb^D = Xx.X Ub,D = Xx.x 

Wby = Xx.x Uby = Xx.x 

bbf),E D = Aa:.E^ X Ub^E D = Ax.E^ x 

Wb,F a = Ax.F^ {Wb,a x) Ub,F a = Xx.Ub,a{^^^ x) 

Wb,ai^a2 = Xf.Xx.Wb,a2if{Ub,aiX)) Ub,ai^a2 = Xf .Xx .Ub,a2if (Wb.aiX)) 

For simplicity, we will write Wa (and similarly Ua) for \Wb,a\ = ||bFk,a||- The 
wrapper and unwrapper functions at a given type a can be seen as completely 
determining a “type-specialization path” . 

Lemma 5 (Wrapper /Unwrapper Types and Annotated Types). 

1 . hWk,a/||a|| ^ |a| 

A hWe.a/llall 
3. ^\Wb,a\ : ||a|| ^ |a| 

I h||Wk,a||:||a||^|a| 

5 . l-jjWe,a|| : I|a|| ^ ||a|| 

The unwrapper function has the dual types. 

Proof. The first two are by simple inductions. The last two come from the basic 
properties of erasure, and the first two properties. In all cases, we have to estab- 
lish the properties of the unwrapper function simultaneously. □ 



Lemma 6 (Simulating Erasure). For all \~eja and \~v/a we have 

\e\ ^ |D| ||e|| ||D|| 

Proof. The forward direction is by a simple induction on the height of the deriva- 
tion. The backward direction is by an induction on the lexicographic order gen- 
erated by the height of the derivation and then the size of the term. □ 

Corollary 1. LemmaMhas a number of useful consequences: 

1 . For hei/a and \-e2la, we have 



ei 



62 



ei 



62 



2 . For \-ejt we have 



e||«|e|^ine|l|^|ing|lll 



Lemma 7 (Projecting Embeddings). For all\~v : ||a 



Ua{Wa V) 



V 
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Proof. By induction on the structure of the annotated types. □ 

For any e such that h |e| : |a|, the tag elimination transformation TE(e, a) 
is defined as: 




he/a 

o.w. 



Note that the input to the tag-elimination transformation is an annotated term 
e and an annotated type a. Both the annotations and the annotated type are 
used to ensure that the transformation is functional. The study of inference 
techniques for the annotations and the annotated type can alleviate the need 
for the annotations and the annotated type, but we leave this for future work. 
Leaving out inference is pragmatically well-motivated, because it is easy for the 
programmer to provide these inputs. 



Theorem 1 (Extensional Semantics of Tag Elimination). For all h |e| : |a| 



TE(e,a) «:! Ua \e\ 

The proof technique used in a previous study on tag elimination |IS| works here. 
Proof. We only need to prove that for all he/a we have 



ell 



Ua |e| 



This proof proceeds by induction the structure of the annotated type a. In the 
case of D and V, the proof comes from the fact that | |e| | « |e| when the annotated 
type a is simply a type. In the case of E D and F a, the proof uses the definition of 
erasure, and the induction hypothesis. The case of arrows is the most interesting. 
It is done using simulation, extensionality, lifting, the ontoness of both erasure 
functions, and the second part of Corollary Q1 □ 



5 Interpreters 



“To explain what interpreters do it is worthwhile to start by discussing 
the differences between interpreting and translation. ” 

Introduction on web-page of 
Russian Interpreters Co-op (RIG). 



In order to be able to address the issue of Jones-optimality formally and to 
establish that a certain program is indeed an interpreter, we will need to review 
some basic issues of encoding and expressibility. Because we are interested in 
typed interpreters, we will begin by refining our notation and define the sets of 
typed terms and values as 





e G Eri-t 



r \~e : t and v GYth 



r \~v : t 
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And we will write Ej and Yt when F is empty. By proving type preservation on 
closed terms, we now can give evaluation a finer type Ej — > Vj. 

We define a first-order datatype as a type D whose values can be tested 
for meta- level syntactic equality within the language. That is, for all ui, U 2 S Yd, 



Vi = V2 



Vi 



V2 



Note that, in general, it is desirable that a language have types which do not have 
this property. In meta-programming settings, it is dangerously easy to thusly 
“trivialize” observational equivalence for all types 0 El Cl In particular, if 
observational equality is the same syntactic equality, many interesting local op- 
timizations such as /3 reduction become semantically unsound. The type D in 
the language presented above is a first-order datatype. 

A programming language has syntactic self-representation ifl 1) it has 
an first-order data type D, and 2) there exists a full embedding E —> Vq, 
meaning that: 

— '~eP is defined for all e, and 

— has a left inverse called l-j : Vd ^ E, that is, = e. 



The left inverse does not have to be a total function: it just needs to be defined 
on all elements of the range (image) of the embedding. 

For our language, we can define the function and its left-inverse as follows: 



'"ei 62 "' = (apply. ('"er.W)) 
'~\x.e = (lambda. ('"^"'.'"e”')) 
'"fixaj.e”' = (fix.('”ar'.'”e~')) 

'"‘s”' = (quote. s) 

'"car ef = (car.'”e”') 

'"cdr o' = (cdr.'”e”') 

and, 

l_Xj = X 

L(apply.(ei.e2))j = lClj lCsli 
L(lambda.(cc.e))j = XiXj-iej 
L(fix.(cc.e))j = fix L2Li-Lej 
L(quote.s)j = s 
L(car.e)j = car ,_ej 
L(cdr.e)j = cdr lCj 



'"cons ei 62"' = (cons. ('"ei"'. '"62"')) 
'"equal? ei e2 = (equal?. ('"ei"'. '"62"')) 
'"if ei 62 ea"' = (if. C"er. ('"62"'. '"ea"'))) 
"E e" = (tagE.'e") 

'"E“^ ef = (untagE.'"e"') 

6^ = (tagF.^e^) 

'"F~^ e = (untagF.'"e"') 



L(cons.(ei.62))j = cons lBli lS2j 
L( equal?.(ei.62))j = equal? lCj lCj 
L(if.(ei.(e2.ea)))j = if lClj lC2j cCaj 
L(tagE.e)j = E l6j 
L( untagE.e)j = E”'^ lGj 
L( tagF.e)j = F lBj 
L( untagF.e)j = F''^ lSj 



It is easy to see that ffe^j = e. Such encoding/decoding pairs have sometimes 
been called reify and reflect. This terminology was promoted by Brian Cantwell 
Smith fl2| . Reify provides us with a way of “materializing” or “representing” 

^ The second requirement is not hard: it is enough to know that D can represent the 
natural numbers. The detailed treatment here is primarily expository. 
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terms within the language, and reflect provides us with a way of interpreting 
an internal representation back into a (meta-level) term. Note that all these 
functions exist at the meta-level, and that expressing them within the language 
requires first defining them at the meta-level. 

As with evaluation, we will be more interested in the “subject- typed” versions 
of these functions: : HSt ^ Vq and L-jt : ^ Et, where the first one is 

achieved by restricting the input to be well-typed, and the second by restricting 
the output to being well-typed. 

With syntactic representation in hand, it is tempting to view interpreters as 
a program (call it direct) expressing the following function: 

(l-j*; ^t) : Vd — *■ Vt 

Because such a function produces a value of the same (subject) type as the 
(subject) term being interpreted, we will call them direct interpreters. But it 
turns out that expressing such an interpreter in a statically typed programming 
language (such as the one at hand) is a rather subtle matter. In fact, it is only 
recently that some work on programming type-indexed values in ML m has 
given a hint of how such a function can be expressed. But even then, it is known 
that we can express such an “interpreter” for each type, but it is not known that 
there is one term that we can call the interpreter and that would work for all 
types. 



5.1 Expressibility and Admissibility of Encoding/Decoding 

While the encoding and decoding function presented above would generally be 
enough for expressing an interpreter in an untyped setting, they are generally not 
enough in a typed setting. To clarify this point, we will analyze the expressibility 
of these functions and of interpreters. 

A partial (meta- level) function / : > Vtj is expressed by a term e/ G 

when for all v G 



6f f{v). 

As a simple example, for any t, the function id : Yt —>■ V* is expressed by the 
term Cid = Xx.x G Yt^t- In contrast, any function that distinguishes between 
the terms Xx.{Xy.y)x and Xx.x would not be expressible. A partial meta-level 
function / : Yt^^ Wj is admissible when for all V\,V2 G Yt^ such that vi ~ V2 
if /(ui) is defined then 1) f{v2) is defined, and 2) f{vi) « f{v2)- Expressible 
functions are admissible, but not necessarily the converse. Thus, admissibility 
helps in establishing negative statements on what can be expressed. 

As is, the two untyped functions described above cannot be expressed in our 
language: in both cases, they don’t have the right type: at least they need to be 
restricted to values in both the domain and the co-domain. 

Here we omit the formal definition of “expresses” for reasons of space. 



8 
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If we restrict the decoding function to values and its result to values of type 
V ^ V, we get a function of type Vd — > Vv^v- This encoding function is admis- 
sible (in fact, even though we don’t prove it, we expect that it is expressible). 
Because one of the main things that we generally want to do with expressions is 
to evaluate them after decoding them, a decoding function restricted to values 
almost (but not quite) models an interpreter. 

However, if we restrict the encoding function to values, and then further 
restrict it to values of some type, say, type V ^ V, we get a function of 
type Vv^v ^ Vd. This function distinguishes between operationally equiva- 
lent terms, therefore, it is not even admissible. 

Because of the subtleties involved in expressing a direct interpreter, a more 
commonly used technique for implementing interpreters involves the use of a 
universal datatype. We define a universal datatype as a type V that allows us 
to simulate values of any type by a value of one (universal) type. We can for- 
malize the notion of simulation concisely as follows: There must exist a universal 
embedding function Wt :"¥t —>■ Vy such that: 

Vi « V2 wt{vi) « Wt{v2) 

We can establish that the datatype V in our programming language is a universal 
datatype by using a family of terms Et and Pt and showing that the latter is a 
left-inverse of the former: 



Ed 


= Acc.E X 


Pd 


= Acc.E ^ X 


Ev 


= \x.x 


Pv 


= Xx.x 




= A/.F Xx.Et.2i.fi.Pt1 x)) 


Pti^t 2 


= Xf.Xx.Pt^if^^ fiEt^ x)) 



And it is easy to show that Pt{Et v) « v. 

Lemma 8 (Projecting Embeddings). Pt{Et v) ^ v 

Proof. By induction on the structure of the types. □ 

Remark 1. Note that the fact that we don’t need to apply the induction hypoth- 
esis in the case of V is essential for the ability to do the proof by induction on 
the structure of types. 

Such a universal datatype plays a crucial role in allowing us to express simple 
interpreters in non-dependently typed programming languages. In particular, 
they allow us to implement typed interpreters by what we will call an indirect 
interpreter. A term is an indirect interpreter (call it indirect) if it expresses the 
function: 



ie-ju^um) : Vd ^ Vv 

While this shift from direct to indirect interpreters makes writing interpreters 
easier, it also introduces the very overhead that the tag elimination trans- 
formation will need to remove. In our Scheme-based implementation the self- 
interpreter is essentially as follow^: 

® Unfortunately, space does not allow us to give all the details here. 
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(fix newenv-eval (lambda env (fix myeval (lambda e 
(if (atom? e) (app env e) 

(if (equal? (car e) (quote lambda)) (.tagF. (lambda x (app (app newenv-eval 
(lambda y (if (equal? y (car (cdr e))) x (app env y)))) (car (cdr (cdr e)))))) 
(if (equal? (car e) (quote app)) (app (.untagF. 

(app myeval (car (cdr e)))) (app myeval (car (cdr (cdr e))))) 

(if (equal? (car e) (quote tagF)) (tagF (.untagF. (app myeval (car (cdr e))))) 



Where, for example, tagF is Fk and .tagF. is Fg. We will define our typed self- 
interpreter tsi to be the term above specialized (by simple application) to the 
empty environment. 

It is folklore that tsi is an indirect interpreter and we do not prove it here. 



6 Staged Interpreters and Translation 



We mentioned in the introduction that a staged interpreter can be viewed sim- 
ply as a translation. This is a subtle shift in perspective. In particular, the only 
requirement on interpreters is that they yield “the right value” from a program. 
Often, the straight-forward implementation of interpreters (in both CBN and 
CBV programming languages) tends to have a pragmatic disadvantage: They 
simply do not ensure a clean separation between the various “stages” of com- 
puting “the right value” of an expression. In particular, straight-forward im- 
plementations of interpreters tend to repeatedly traverse the expression being 
interpreted. Ideally, one would like this traversal to be done once and for all. In 
general, achieving this kind of separation gives rise to the need for using two- 
and multi-level languages. 

“Staged interpreters”, therefore, are not any composition of the functions 
described above, but rather, a particular implementation of this composition 
that behaves in a certain manner. Because CBN and CBV functional languages 
cannot force evaluation under lambda, they are thought to be insufficient for 
expressing staging. Nevertheless, the result of a staged interpreter (which is a 
term in the target language corresponding to the given term in the subject 
language) is expressible in the language. Furthermore, the result of a staged 
interpreter is also observationally equivalent to the result of an interpreter. These 
facts imply that, while staged interpreters are not known to be expressible in a 
language such as the one we are studying in this paper, they are still admissible. 

Pragmatic experience with staged interpreters suggests that their input- 
output behavior can be modeled rather straight-forwardly as a translator. For 
simplicity, it is enough to focus on a self-interpreter: an interpreter written in 
the same language it interprets. When staged, a self-interpreter translates terms 
in one language into terms in the same language. Note however, that for typing 
reasons, the resulting translation is not the identity. Rather, it is the following 
function: 
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£{x) = X 

£{ei 62) = (F-i £{ei)) £(62) 
£(Ax.e) = Fe Ax.£(e) 

£(fix x.e) = fix x.£(e) 

£(‘s) = Ee ‘s 
£(o e) = Ee o (E-1 £(e)) 



£(u 6162) = Ee u (E3 1 £(ei))(E3 ^ f (62)) 
£(if ei 62 63) = if (E-i £(ei)) £( 62 ) £( 63 ) 
£{E e) = Ek Eji £{e) 

5(E-i e) = Ee E^i £{e) 

£{E e) = Fk F-i £{e) 
f (F-i e) = Fe F^i £{e) 



One can show that \£{e)\ « (tsi V). It is reasonable to expect that a staged tsi 
produces \£{e)\ when applied to V, and our implementation confirms it. 

The annotated type of the result of translating a term of type t is defined as 

follow^i3 



5(D) = ED, 5(V)=V, £(t,^t2) = F(£(h)^£(t2)), 

The idempotence of the translation on V is essential for being able to do the 
various proofs that are carried out by induction on the structure of the types. 

Lemma 9 (Soundness (and Full- Abstraction) of Translation). 

ei « 62 | 5 (ei)| « 15(62)1 

Proof. Proved by showing that 5(e) « Wt e, and Projecting Embeddings lemma. 

□ 



Lemma 10 (Well- Typed Terms “Go Through”). 

1. PFe-.t £{P) h5(e)/5(t) 

||5(e)|Ne 

Proof. Both by a simple induction on the structure of e. □ 

The first part of this lemma means that running the staged interpreter on a well- 
typed subject program yields a term that passes the tag elimination analysis. 
The second part means that erasing the operations marked by e yields back 
precisely the term that we started with. 

Note further that, using the second part, we can strengthen the first part of 
this lemma to be: 



The: t ^ £{P) h5(e)/5(t) 

This statement means that applying the tag elimination analysis to the result 
of a staged self-interpreter is exactly the same as type-checking the term being 
interpreted. This is probably the most accurate characterization of the strength 
of the idea of tag-elimination. 



10 



Now we can see how U and W generalize P and E. For example, Ws{t) = Et. 
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7 Jones-Optimal Specialization 

At this point, we have presented a variety of results that indicate that tag elim- 
ination has a useful application in the context of self-application of a specific 
typed programming language, and would therefore be useful in improving the 
effectiveness of traditional partial evaluators in staging many interesting inter- 
preters written in this language. Now we turn to addressing the long-standing 
open problem of Jones-optimality, formally. A function PE is a partial evalu- 
ator if, for all closed ei and 62: 

PE(ei,e2) « ei 62 

A partial evaluator is partially -correct if it is a partial function satisfying the 
above equation, when defined. Note that we require ei and 62 to be closed only 
for simplicity, as partial evaluators are syntactic operations and therefore must 
deal with free variables anyway. A function tPE is a typed partial evaluator if 

hei 62 : \a\ tPE(ei,62,a) « Ua (ei 62) 

A partial evaluator is partially -correct if it is a partial function satisfying the 
above equation, when defined. This definition of a typed partial evaluator is mo- 
tivated by the definition of an self-interpreter in a typed programming language. 
A self-interpreter si is a term such that: 



A typed self-interpreter tsi is a term such that: 

\~e : t tsi V ~ Wt e 

where Wt is a universal embedding function. Now we can recapitulate the def- 
inition of Jones-optimality jS]. A partial evaluator PE is Jones-optimal with 
respect to a an untyped self-interpreter si when for all h e : t we have 

PE(si,''e'') = 6 

Again motivated by the role that a universal datatype plays in typed interpreters, 
we generalize the definition of Jones-optimality to the typed setting as follows: 
A typed partial evaluator tPE is said to be Jones-optimal with respect to a 
typed self-interpreter tsi when for all h e : t\ 

tPE(tsi, '~^,£{t)) = e 



Theorem 2 (Main). 

1. Whenever PE(_, _) is a (partially) correct partial evaluator, TE(PE(_, _), _) is 
a (partially) correct typed partial evaluator, furthermore 
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2. Whenever, for all e it is the case that PE(tsi, V) = S{e), then TE(PE(_, _), _) 

is Jones-optimal. 

Proof. For the first part, all we need is to follow a simple sequence of semantic 
equalities: 

TE(PE(ei, 62 ), a) by extensional semantics of TE 
~ C/a(PE(ei, 62 )) by definition of a PE 
« Ua{ei 62) 

and we have satisfied the definition of a tPE. 

For the second part, we only have to follow a simple sequence of syntactic 
equalities: 

TE(PE(tsi, '"e”'), £(t)) by assumption 
= TE(£(e), £(f)) by Lemma ITnil. \-£{e)/£(t) so TE “succeeds” 

= I |g lei 1 1 and bv Lemma 111112 
= e 

and we have satisfied the definition of typed Jones-optimality. □ 

We have built an implementation that supports this result. 

8 Conclusions and Future Work 

In this paper, we have presented the theoretical results showing how Jones- 
optimality is achieved using tag elimination. We have also implemented a system 
based on the analysis presented here (in Scheme), and it has validated our theo- 
retical result^3- The analysis we presented here contains technical improvements 
over the original proposal HH in that it uses a simpler judgment. The main rea- 
son for this simplicity is that we exploit information about well-formedness of 
annotated types in the judgmenlO- However, it is also more specialized than the 
original analysis, which is parametric over an arbitrary datatype that we might 
want to eliminate. 

The moral of the present work is that there is a practical solution to the 
problem of Jones-optimality which can be attained through some simple anno- 
tations by the user. There is evidence that the annotations may not be necessary 
in practice. Makholm p] implemented a variant of tag elimination for a first- 
order language whose type structure is different than that of the language we use 
here. In this implementation, the analog of our e and k annotations are inferred 
automatically by the tag eliminator instead of being embedded in the staged 
interpreter. In principle, it seems that such inference in the setting presented 
in this paper should be decidable although not necessarily efficient. Whether or 

The system can be downloaded from http://www.diku.dk/~makholm/teaftar.gz. 

In fact, in this paper, we have only talked about well- formed annotated types. There 
is a general way to go from the original definition of well-formedness to the kind of 
presentation given here, but this is beyond the limits of space available here. 
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not efficient and practical inference will scale to the higher-order setting is not 
known. The work on dynamic typing may help establish such a result formally 
0. Combinations of wrapper and unwrapper functions provide natural mecha- 
nisms for a notion of subsumption or subtyping that can be used to provide an 
analog of soft- typing. Finally, we hope to generalize this work to richer settings 
with state and polymorphism. 
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McAllester [Q has recently shown that the running time of a bottom-up logic 
program can be bounded by the number of “prefix firings” of its inference rules. 
This theorem allows one to view a logic program as an algorithm whose running 
time is given by the number of prefix firings of the rules. Although pure logic 
programs under prefix firing running time are adequate for many algorithms, 
many other algorithms seem to lie outside of this framework. 

Here we extend the concept of inference systems to one in which atoms 
can also be deleted in the course of application of an inference rule. Deletion 
makes the behavior of the algorithm nondeterministic. For example, consider 
the following rules with deletion where the marking [. . .] means that the premise 
is to be deleted as soon as the rule is applied. 

P ^ Q [Q] ^ S' [Q]^W 

Suppose the initial data base contains only P. The first rule fires, adding the 
assertion Q. Now either the second or third rule can fire. Since each of these 
rules deletes Q, once one of them fires, the other is blocked. Hence the final 
data base is either {P, S} or {P, W}, nondeterministically. When viewing rules 
with deletions as algorithms, this nondeterminism is viewed as “don’t care” 
nondeterminism — the choices are made arbitrarily and irrevocably. On the other 
hand, the time needed to compute a deductive closure of a data base might 
crucially depend on which execution path is chosen. To gain control over this 
phenomenon, we allow priorities to be attached to the rules, specifying the order 
in which rules should fire. 

We prove a general meta-complexity theorem for such inference systems. This 
theorem can be viewed as establishing a prefix-firing notion of running time for 
programs with deletion and priorities. We show that under this notion of running 
time one can give efficient logic-program implementations of union-find and other 
algorithms, such as congruence closure, which depend on union-find. 

Logic programs with deletion and priorities, although more complex than 
pure logic programs, still seem simpler and more easily analyzed than classical 
pseudo-code based on iteration and recursion. As an example we give a high-level 

* The results reported here were obtained in collaboration with David McAllester, 
AT&T Labs-Research. 
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formulation of Dijkstra’s shortest path algorithm. This algorithm requires prior- 
ities for rule instances rather than rule schemas. A problem requiring priorities 
only on the level of rule schemas is ground Horn satisfiability in the presence 
of equality. We give an algorithm that, to our knowledge, is the first O(nlogn) 
algorithm for this problem. 
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